TrainingarXiv cs.AI — 9 d ago

Exploring Starts Are Not Enough: Counterexamples and a Fix for Monte Carlo Exploring Starts

This paper presents counterexamples demonstrating that Monte Carlo Exploring Starts (MCES) can converge to suboptimal solutions in both initial-visit and first-visit scenarios, addressing a key open question in reinforcement learning. It introduces a modification for initial-visit MCES that scales learning rates inversely to update frequencies on a state-by-state basis, ensuring convergence to optimality, which is crucial for large-scale applications requiring value function approximation. These findings emphasize the importance of learning rate selection and the balance between exploration and exploitation in effective Monte Carlo control implementations.

reinforcement-learningmonte-carlorelevance 0.00 · engagement 0.00

Read at source ↗← all news