Agents
Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning
The paper introduces QGF (Q-Guided Flow), a novel reinforcement learning algorithm that optimizes policies entirely at test time by leveraging a pre-trained reference flow policy and a value function critic. QGF demonstrates superior performance on single-task and goal-conditioned offline RL benchmarks, outperforming existing test-time methods while remaining competitive with state-of-the-art training-time algorithms, all while being more computationally efficient. This approach offers a stable and scalable alternative for practitioners working with high-dimensional action spaces in RL, minimizing the instability associated with traditional actor-critic training.
reinforcement learningpolicy optimizationtest-time