CLEAR: Cognition and Latent Evaluation for Adaptive Routing in End-to-End Autonomous Driving
Summary
CLEAR (Cognition and Latent Evaluation for Adaptive Routing) is an end-to-end autonomous driving framework designed to overcome the latency issues of diffusion models in multi-modal maneuver generation. It integrates ultra-fast generative planning with deep semantic reasoning. The framework utilizes Drive-JEPA as its visual encoder and replaces the traditional multi-step denoising process with a single-step conditional drift within a VAE latent space, employing a conditioning coefficient to balance diversity and precision. Furthermore, CLEAR fine-tunes Qwen 3.5 0.8B on driving QA pairs to extract scene-aware hidden states. These states are crucial for guiding an Adaptive Scheduler, which dynamically selects the conditioning coefficient alpha and sample count N, and a cross-attention scorer that identifies the optimal trajectory from generated candidates. On the NAVSIM v1 benchmark, CLEAR achieved a leading PDMS of 93.7, operating at up to 99 FPS, proving efficient, high-fidelity, multi-modal planning without requiring dense geometric annotations or iterative sampling.
Key takeaway
For autonomous driving engineers facing latency challenges in multi-modal planning, CLEAR presents a compelling solution. You should evaluate integrating single-step conditional drift in a VAE latent space, coupled with a compact LLM for adaptive scheduling and trajectory scoring. This approach achieves top performance (PDMS 93.7) at high frame rates (99 FPS) on NAVSIM v1, demonstrating that efficient, high-fidelity planning is achievable without complex iterative sampling or large-scale MLLMs. Consider prototyping this architecture to enhance your system's real-time responsiveness and safety.
Key insights
CLEAR achieves efficient, multi-modal autonomous driving by combining single-step generative planning with LLM-driven cognitive reasoning.
Principles
- Multi-modal planning balances diversity and precision via conditioning.
- LLM hidden states encode traffic logic for adaptive decision-making.
- Single-step latent drift enables efficient, multi-modal trajectory generation.
Method
CLEAR employs Drive-JEPA for visual features and a fine-tuned Qwen 3.5 0.8B for semantic states. It uses a single-step conditional drift in VAE latent space for trajectory generation, guided by an Adaptive Scheduler and Cross-Attention Scorer.
In practice
- Use a compact LLM for semantic feature extraction.
- Implement single-step latent drift for fast trajectory generation.
- Dynamically adjust planning diversity based on scene complexity.
Topics
- End-to-End Autonomous Driving
- Trajectory Planning
- Generative Models
- Large Language Models
- VAE Latent Space
- Adaptive Routing
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.