Training
Exploring Starts Are Not Enough: Counterexamples and a Fix for Monte Carlo Exploring Starts
This paper presents counterexamples demonstrating that Monte Carlo Exploring Starts (MCES) can converge to suboptimal solutions in both initial-visit and first-visit scenarios, addressing a key open question in reinforcement learning. It introduces a modification for initial-visit MCES that scales learning rates inversely to update frequencies on a state-by-state basis, ensuring convergence to optimality, which is crucial for large-scale applications requiring value function approximation. These findings emphasize the importance of learning rate selection and the balance between exploration and exploitation in effective Monte Carlo control implementations.
reinforcement-learningmonte-carlo