Gradient-based Planning for World Models at Longer Horizons

2026-04-20 · Source: The Berkeley Artificial Intelligence Research Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

GRASP (Gradient RelAxed Stochastic Planner) is a new gradient-based planning method designed to improve long-horizon planning with large, learned "world models." Traditional planning methods struggle with long horizons due to exploding/vanishing gradients, non-greedy landscapes, and the adversarial robustness issues inherent in deep learning-based world models, particularly concerning state-input gradients. GRASP addresses these challenges by lifting the trajectory into virtual states for parallel optimization, adding stochasticity to state iterates for exploration, and reshaping gradients to rely only on robust action Jacobians, avoiding brittle state Jacobians. This approach significantly boosts success rates and reduces planning time for tasks like Push-T, achieving 43.4% success in 15.2 seconds for H=50, compared to 30.2% in 96.2 seconds for CEM.

Key takeaway

For research scientists developing control systems with learned world models, GRASP offers a robust solution to the fragility of long-horizon planning. You should consider adopting GRASP's principles of parallel state optimization, stochastic state exploration, and action-gradient-only dependence to overcome issues like exploding gradients and adversarial robustness, potentially integrating it into closed-loop systems or RL policy learning for adaptive long-horizon control.

Key insights

GRASP enables robust, long-horizon planning with learned world models by re-engineering gradient flow and introducing state-level stochasticity.

Principles

Long-horizon planning requires non-greedy behavior.
State-input gradients in deep world models are brittle.
Action gradients are more robust for optimization.

Method

GRASP uses a collocation-based objective with stop-gradient dynamics loss and dense goal shaping, injecting Gaussian noise into state updates for exploration, and periodically syncing with true rollout gradients for refinement.

In practice

Use collocation for parallelizing long-horizon planning.
Avoid direct optimization through state-input gradients.
Introduce stochasticity in state updates for exploration.

Topics

GRASP
World Models
Long-Horizon Planning
Gradient-Based Planning
Adversarial Robustness

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Berkeley Artificial Intelligence Research Blog.