ai-digest.dev
last updated 5 h ago
AgentsarXiv cs.AI 21 h ago

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

The paper introduces QGF (Q-Guided Flow), a novel reinforcement learning algorithm that optimizes policies entirely at test time by leveraging a pre-trained reference flow policy and a value function critic. QGF demonstrates superior performance on single-task and goal-conditioned offline RL benchmarks, outperforming existing test-time methods while remaining competitive with state-of-the-art training-time algorithms, all while being more computationally efficient. This approach offers a stable and scalable alternative for practitioners working with high-dimensional action spaces in RL, minimizing the instability associated with traditional actor-critic training.

reinforcement learningpolicy optimizationtest-timerelevance 0.00 · engagement 0.00
Read at source ↗← all news
Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning — AI News Digest