Agents
Human-like autonomy emerges from self-play and a pinch of human data
The article presents a novel reinforcement learning approach that integrates a minimal amount of human driving data (30 minutes) with self-play to improve the training of driving policies. This method, which operates with a safe goal-reaching reward structure, allows for effective policy training in just 15 hours on a single consumer-grade GPU, significantly reducing the reliance on extensive human demonstrations compared to traditional imitation learning. This advancement is crucial for practitioners as it enhances the alignment of AI driving behaviors with human norms while maintaining training efficiency.
self-playreinforcement learninghuman data