Beyond Conservative Automated Driving in Multi-Agent Scenarios via Coupled Model Predictive Control and Deep Reinforcement Learning
Summary
A new integrated MPC-RL framework has been developed to enhance automated driving performance in complex multi-agent scenarios, specifically at unsignalized intersections. This framework couples a Deep Reinforcement Learning (RL) agent for high-level speed guidance with a Model Predictive Control (MPC) controller for low-level trajectory optimization and constraint enforcement. Experiments conducted in the Highway-Env simulation environment across three traffic-density levels (Easy, Moderate, Hard) demonstrate that MPC-RL significantly outperforms standalone MPC and end-to-end RL. The integrated approach reduced the collision rate by 21% and improved the success rate by 6.5% compared to pure MPC. Furthermore, the framework exhibited robust zero-shot transferability to a highway merging scenario without retraining, highlighting the MPC backbone's role in cross-scenario generalization. MPC-RL also showed faster loss stabilization during training, indicating a reduced learning burden.
Key takeaway
For research scientists developing autonomous driving systems, this integrated MPC-RL framework offers a compelling approach to balance safety and efficiency in multi-agent environments. You should consider adopting a coupled architecture where RL provides adaptive speed references and MPC handles constrained trajectory optimization, as this design significantly improves performance and generalization compared to standalone methods. This approach also reduces the learning burden, leading to faster training convergence.
Key insights
Coupling RL for speed guidance with MPC for trajectory optimization improves autonomous driving safety and efficiency.
Principles
- MPC provides robust constraint handling and generalization.
- RL learns adaptive behaviors from complex interactions.
- Maintaining MPC collision avoidance during training prevents mismatch.
Method
The RL component outputs a normalized speed multiplier, which scales to a reference speed for the MPC controller. MPC then solves a finite-horizon optimization problem to generate control inputs while tracking the RL-recommended speed and enforcing constraints.
In practice
- Integrate RL for high-level decision-making.
- Use MPC for low-level control and safety guarantees.
- Train with collision avoidance active in MPC.
Topics
- Model Predictive Control
- Deep Reinforcement Learning
- Autonomous Navigation
- Multi-Agent Interaction
- Unsignalized Intersections
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.