CLEAR: Cognition and Latent Evaluation for Adaptive Routing in End-to-End Autonomous Driving

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, long

Summary

CLEAR (Cognition and Latent Evaluation for Adaptive Routing) is an end-to-end autonomous driving framework designed to overcome the latency issues of diffusion models in multi-modal maneuver generation. It integrates ultra-fast generative planning with deep semantic reasoning. The framework utilizes Drive-JEPA as its visual encoder and replaces the traditional multi-step denoising process with a single-step conditional drift within a VAE latent space, employing a conditioning coefficient to balance diversity and precision. Furthermore, CLEAR fine-tunes Qwen 3.5 0.8B on driving QA pairs to extract scene-aware hidden states. These states are crucial for guiding an Adaptive Scheduler, which dynamically selects the conditioning coefficient alpha and sample count N, and a cross-attention scorer that identifies the optimal trajectory from generated candidates. On the NAVSIM v1 benchmark, CLEAR achieved a leading PDMS of 93.7, operating at up to 99 FPS, proving efficient, high-fidelity, multi-modal planning without requiring dense geometric annotations or iterative sampling.

Key takeaway

For autonomous driving engineers facing latency challenges in multi-modal planning, CLEAR presents a compelling solution. You should evaluate integrating single-step conditional drift in a VAE latent space, coupled with a compact LLM for adaptive scheduling and trajectory scoring. This approach achieves top performance (PDMS 93.7) at high frame rates (99 FPS) on NAVSIM v1, demonstrating that efficient, high-fidelity planning is achievable without complex iterative sampling or large-scale MLLMs. Consider prototyping this architecture to enhance your system's real-time responsiveness and safety.

Key insights

CLEAR achieves efficient, multi-modal autonomous driving by combining single-step generative planning with LLM-driven cognitive reasoning.

Principles

Method

CLEAR employs Drive-JEPA for visual features and a fine-tuned Qwen 3.5 0.8B for semantic states. It uses a single-step conditional drift in VAE latent space for trajectory generation, guided by an Adaptive Scheduler and Cross-Attention Scorer.

In practice

Topics

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Robotics Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.