Training
When Life Gives You BC, Make Q-functions: Extracting Q-values from Behavior Cloning for On-Robot Reinforcement Learning
The paper introduces Q2RL, a novel algorithm that integrates Q-Estimation and Q-Gating to enhance Behavior Cloning (BC) for reinforcement learning in robotics. Q-Estimation extracts a Q-function from a BC policy using minimal interaction, while Q-Gating optimally switches between BC and reinforcement learning actions based on Q-values. The method demonstrates significant improvements over state-of-the-art offline-to-online learning techniques, achieving up to 100% success rates in complex manipulation tasks with rapid convergence, making it highly applicable for on-robot reinforcement learning scenarios.
reinforcement-learningbehavior-cloningrobotics