Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning
Summary
A new distribution-agnostic robust trajectory-optimization framework is introduced, leveraging chance-constrained reinforcement learning to manage uncertainty from initial conditions and process noise. The approach first computes a deterministic nominal trajectory offline, then uses reinforcement learning to robustify this baseline through a structured affine closed-loop correction law, incorporating feedforward control adjustments and time-varying feedback gains. Probabilistic feasibility is empirically enforced via rollout-based upper-tail quantiles, while terminal dispersion is controlled using covariance-feasibility penalties. The framework was assessed on two distinct problems: a three-dimensional multi-impulse Earth-Mars transfer and a stochastic atmospheric pinpoint rocket landing. Results demonstrate competitive upper-tail fuel cost and preserved probabilistic feasibility, highlighting the robustification scaffold's portability across diverse spacecraft trajectory planning scenarios without core stochastic-control structure redesign.
Key takeaway
For Aerospace Engineers designing complex spacecraft trajectories, this framework offers a robust method to manage uncertainty without redesigning core stochastic-control structures. You can compute a nominal path and then efficiently robustify it using reinforcement learning, ensuring probabilistic feasibility and controlled terminal dispersion even with unseen disturbances. Consider applying this approach to multi-impulse transfers or pinpoint landings to enhance mission reliability and reduce fuel costs under uncertain conditions.
Key insights
A chance-constrained reinforcement learning framework robustifies nominal trajectories against sampled uncertainty using affine closed-loop corrections.
Principles
- Robust trajectory optimization can correct a nominal baseline.
- Probabilistic feasibility is enforceable via rollout-based quantiles.
- A single robustification scaffold can span diverse problems.
Method
Compute a nominal trajectory offline, then use RL to learn an affine closed-loop correction law with feedforward adjustments and time-varying feedback gains, enforcing probabilistic feasibility via upper-tail quantiles and regulating terminal dispersion with covariance penalties.
In practice
- Design robust policies for Earth-Mars transfers.
- Adapt for stochastic rocket landing problems.
- Evaluate policies under diverse uncertainty distributions.
Topics
- Trajectory Optimization
- Reinforcement Learning
- Chance-Constrained Control
- Robust Control
- Spacecraft Trajectory Planning
- Stochastic Control
Best for: Research Scientist, AI Scientist, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.