Agents
Direction-Conditioned Policies via Compositional Subgoal Scoring for Online Goal-Conditioned Reinforcement Learning
The article introduces Direction-Conditioned Policies (DCP), an online method for goal-conditioned reinforcement learning that improves action selection by decomposing goal-reaching into a subgoal-scoring step and a direction-conditioned actor. DCP utilizes a shared InfoNCE representation to align visited states with final goals, demonstrating improved performance over Contrastive RL across nine environments, particularly in manipulation and obstacle-interaction tasks. This approach is significant for practitioners as it enhances the efficiency and effectiveness of goal-directed learning by leveraging geometrically informative signals, potentially leading to more robust AI systems in complex environments.
reinforcementlearninggoalconditioning