Training
Reversal Q-Learning
The article presents Reversal Q-Learning (RQL), a novel off-policy reinforcement learning algorithm that utilizes flow policies trained on prior data within an "expanded" Markov decision process framework. RQL innovatively generates virtual on-policy trajectories by reversing flows and incorporates bias-and-variance reduction techniques to address the curse of horizon in off-policy RL. Experimental results demonstrate that RQL outperforms existing flow-based offline RL methods across 50 simulated robotic tasks, highlighting its potential for enhancing performance in complex offline RL scenarios.
reinforcement learningoff-policyflow policy