AgentsarXiv cs.AI — 15 d ago

Human-like autonomy emerges from self-play and a pinch of human data

The article presents a novel reinforcement learning approach that integrates a minimal amount of human driving data (30 minutes) with self-play to improve the training of driving policies. This method, which operates with a safe goal-reaching reward structure, allows for effective policy training in just 15 hours on a single consumer-grade GPU, significantly reducing the reliance on extensive human demonstrations compared to traditional imitation learning. This advancement is crucial for practitioners as it enhances the alignment of AI driving behaviors with human norms while maintaining training efficiency.

self-playreinforcement learninghuman datarelevance 0.00 · engagement 0.00

Read at source ↗← all news