Scaling Self-Play for End-to-End Driving
Summary
Gigapixel introduces a novel strategy for training end-to-end autonomous driving models, addressing limitations of offline human-demonstration datasets like limited state coverage and compounding errors. This approach utilizes large-scale self-play directly from pixels in simulation. Gigapixel is a high-throughput batched driving simulator that renders a simplified bounding-box world, achieving 50k agent steps per second. Instead of direct pixel-space self-play RL, which is sample-inefficient, the method employs self-play DAgger training, distilling policies from a privileged RL teacher. Policies are then transferred to real-world sensor data via lightweight perception adaptation. This strategy achieves competitive performance on HUGSIM and NAVSIM-v2 benchmarks without human trajectory supervision, demonstrating that scaling self-play training yields proportional performance gains.
Key takeaway
For machine learning engineers developing end-to-end autonomous driving systems, integrate large-scale self-play DAgger training. This method, utilizing high-throughput simulators like Gigapixel and on-policy distillation, offers a scalable alternative to human-demonstration datasets. It improves robustness against compounding errors and long-tail interactions, especially when combined with lightweight perception adaptation for real-world deployment.
Key insights
Self-play DAgger training in a high-throughput simulator enables scalable, robust end-to-end autonomous driving from pixels.
Principles
- Self-play overcomes offline human data limitations.
- Simplified simulation enables high-throughput pixel training.
- Scaling self-play yields proportional performance gains.
Method
Self-play DAgger training involves distilling pixel-based policies from a privileged RL teacher in a high-throughput simulator, followed by lightweight perception adaptation for sim-to-real transfer.
In practice
- Use simplified bounding-box worlds for high-throughput simulation.
- Employ on-policy distillation for sample-efficient pixel-space RL.
- Adapt policies to real-world data via lightweight perception.
Topics
- End-to-End Driving
- Self-Play
- Autonomous Driving
- Driving Simulation
- Reinforcement Learning
- Sim-to-Real Transfer
Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.