TrainingarXiv cs.AI — 12 d ago

Reversal Q-Learning

The article presents Reversal Q-Learning (RQL), a novel off-policy reinforcement learning algorithm that utilizes flow policies trained on prior data within an "expanded" Markov decision process framework. RQL innovatively generates virtual on-policy trajectories by reversing flows and incorporates bias-and-variance reduction techniques to address the curse of horizon in off-policy RL. Experimental results demonstrate that RQL outperforms existing flow-based offline RL methods across 50 simulated robotic tasks, highlighting its potential for enhancing performance in complex offline RL scenarios.

reinforcement learningoff-policyflow policyrelevance 0.00 · engagement 0.00

Read at source ↗← all news