PLAN-S: Bridging Planning with Latent Style Dynamics for Autonomous Driving World Models

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

PLAN-S, a novel planner-facing bridge, enhances latent world models (LWMs) for autonomous driving by explicitly decoding a style-conditioned, four-channel semantic cost map from latent representations. This cost map, covering dynamic obstacles, off-road regions, static obstacles, and drivability, is conditioned on ego state and driving style via a dual AdaFiLM mechanism. PLAN-S integrates with regression planners through attention-level fusion and with anchor-score planners via reward-level fusion, keeping host backbones frozen. Validated on ResWorld with nuScenes and WoTE with NAVSIM, PLAN-S reduced L2 error to 0.55 m and the 3 s collision rate by 42% on nuScenes. On NAVSIM, the rule-cost variant achieved 89.4 Predictive Driver Model Score (PDMS), with the learned cost variant providing complementary gains on challenging scenes. The system adds only 0.25 million parameters and runs at 17.0 frames per second.

Key takeaway

For Machine Learning Engineers developing autonomous driving systems, PLAN-S offers a robust method to integrate explicit style-conditioned spatial costs into latent world models. You should consider implementing a similar planner-facing bridge to enhance trajectory safety and interpretability, especially for collision reduction. This approach allows for inspectable risk modeling and diverse style preferences, improving performance on challenging driving scenarios without significant inference overhead.

Key insights

PLAN-S enhances autonomous driving LWMs by explicitly modeling style-conditioned spatial costs for improved controllability and safety.

Principles

Explicitly model risk, drivability, and style preferences.
Organize latent representations as spatial costs.
Ensure portability across planner families.

Method

PLAN-S decodes a four-channel semantic cost map from BEV latent features, conditioned by ego state and driving style via dual AdaFiLM. It integrates via attention-level fusion for regression planners or reward-level fusion for anchor-score planners.

In practice

Use a 4-channel cost map for explicit risk/drivability.
Apply dual AdaFiLM for style-conditioned modulation.
Integrate cost maps before final trajectory selection.

Topics

Autonomous Driving
Latent World Models
Semantic Cost Maps
Driving Style Personalization
Trajectory Planning
Deep Learning Architectures

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.