Distribution-Agnostic Robust Trajectory Optimization via Chance-Constrained Reinforcement Learning

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A new distribution-agnostic robust trajectory-optimization framework is introduced, leveraging chance-constrained reinforcement learning to manage uncertainty from initial conditions and process noise. The approach first computes a deterministic nominal trajectory offline, then uses reinforcement learning to robustify this baseline through a structured affine closed-loop correction law, incorporating feedforward control adjustments and time-varying feedback gains. Probabilistic feasibility is empirically enforced via rollout-based upper-tail quantiles, while terminal dispersion is controlled using covariance-feasibility penalties. The framework was assessed on two distinct problems: a three-dimensional multi-impulse Earth-Mars transfer and a stochastic atmospheric pinpoint rocket landing. Results demonstrate competitive upper-tail fuel cost and preserved probabilistic feasibility, highlighting the robustification scaffold's portability across diverse spacecraft trajectory planning scenarios without core stochastic-control structure redesign.

Key takeaway

For Aerospace Engineers designing complex spacecraft trajectories, this framework offers a robust method to manage uncertainty without redesigning core stochastic-control structures. You can compute a nominal path and then efficiently robustify it using reinforcement learning, ensuring probabilistic feasibility and controlled terminal dispersion even with unseen disturbances. Consider applying this approach to multi-impulse transfers or pinpoint landings to enhance mission reliability and reduce fuel costs under uncertain conditions.

Key insights

A chance-constrained reinforcement learning framework robustifies nominal trajectories against sampled uncertainty using affine closed-loop corrections.

Principles

Method

Compute a nominal trajectory offline, then use RL to learn an affine closed-loop correction law with feedforward adjustments and time-varying feedback gains, enforcing probabilistic feasibility via upper-tail quantiles and regulating terminal dispersion with covariance penalties.

In practice

Topics

Best for: Research Scientist, AI Scientist, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.