Training
Deep Q-Learning on H\"older Spaces
The paper presents an analysis of Q-learning in continuous-time stochastic control, focusing on the Bellman optimality target's regularity and approximation complexity in a diffusion setting. It introduces a tensor-product DeepONet architecture that accommodates the mixed regularity of the problem, demonstrating that Bellman updates maintain Lipschitz dependence on actions while enhancing state variable regularity. This work contributes to Q-learning theory by establishing approximation bounds and a stiffness-complexity trade-off, crucial for practitioners developing algorithms in continuous environments, although it stops short of providing a full convergence theorem for practical implementations.
deep q-learningreinforcement learningcontinuous time