Gradient-based Planning for World Models at Longer Horizons
Summary
GRASP (Gradient RelAxed Stochastic Planner) is a new gradient-based planning method designed to improve long-horizon planning with large, learned "world models." Traditional planning methods struggle with long horizons due to exploding/vanishing gradients, non-greedy landscapes, and the adversarial robustness issues inherent in deep learning-based world models, particularly concerning state-input gradients. GRASP addresses these challenges by lifting the trajectory into virtual states for parallel optimization, adding stochasticity to state iterates for exploration, and reshaping gradients to rely only on robust action Jacobians, avoiding brittle state Jacobians. This approach significantly boosts success rates and reduces planning time for tasks like Push-T, achieving 43.4% success in 15.2 seconds for H=50, compared to 30.2% in 96.2 seconds for CEM.
Key takeaway
For research scientists developing control systems with learned world models, GRASP offers a robust solution to the fragility of long-horizon planning. You should consider adopting GRASP's principles of parallel state optimization, stochastic state exploration, and action-gradient-only dependence to overcome issues like exploding gradients and adversarial robustness, potentially integrating it into closed-loop systems or RL policy learning for adaptive long-horizon control.
Key insights
GRASP enables robust, long-horizon planning with learned world models by re-engineering gradient flow and introducing state-level stochasticity.
Principles
- Long-horizon planning requires non-greedy behavior.
- State-input gradients in deep world models are brittle.
- Action gradients are more robust for optimization.
Method
GRASP uses a collocation-based objective with stop-gradient dynamics loss and dense goal shaping, injecting Gaussian noise into state updates for exploration, and periodically syncing with true rollout gradients for refinement.
In practice
- Use collocation for parallelizing long-horizon planning.
- Avoid direct optimization through state-input gradients.
- Introduce stochasticity in state updates for exploration.
Topics
- GRASP
- World Models
- Long-Horizon Planning
- Gradient-Based Planning
- Adversarial Robustness
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Berkeley Artificial Intelligence Research Blog.