AgentsarXiv cs.AI — 21 h ago

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

The paper introduces QGF (Q-Guided Flow), a novel reinforcement learning algorithm that optimizes policies entirely at test time by leveraging a pre-trained reference flow policy and a value function critic. QGF demonstrates superior performance on single-task and goal-conditioned offline RL benchmarks, outperforming existing test-time methods while remaining competitive with state-of-the-art training-time algorithms, all while being more computationally efficient. This approach offers a stable and scalable alternative for practitioners working with high-dimensional action spaces in RL, minimizing the instability associated with traditional actor-critic training.

reinforcement learningpolicy optimizationtest-timerelevance 0.00 · engagement 0.00

Read at source ↗← all news