VLADriveBench: Evaluating CoT-Action Relationship in VLA for Autonomous Driving
Summary
VLADriveBench is a novel framework designed to evaluate the critical chain-of-thought (CoT)-action relationship in vision-language-action (VLA) models for autonomous driving. Existing benchmarks primarily focus on trajectory quality, neglecting the relevance, consistency, or causal connection of the generated CoT reasoning to the driving actions. VLADriveBench addresses this gap by integrating observational metrics, including mentioning, hallucination, contradiction, and action alignment, with a CoT intervention protocol. Applying this framework to three VLA models across two architectures revealed significant divergences: ORION achieved the highest observational alignment scores, yet its CoT was found to be epiphenomenal, while Alpamayo v1.5, despite lower scores, demonstrated a strongly causal CoT, with visual salience gating its influence.
Key takeaway
For Machine Learning Engineers developing or evaluating vision-language-action models for autonomous driving, relying solely on trajectory quality metrics is insufficient. You must explicitly assess the causal relationship between the model's chain-of-thought reasoning and its driving actions. Integrate VLADriveBench's observational metrics and CoT intervention protocol into your evaluation pipeline to ensure your model's CoT genuinely influences behavior, especially considering how visual salience might gate this influence.
Key insights
VLADriveBench evaluates the causal link between VLA model chain-of-thought and autonomous driving actions, revealing discrepancies in CoT utility.
Principles
- Observational CoT alignment doesn't guarantee causality.
- CoT influence can be gated by visual salience.
Method
VLADriveBench combines observational metrics (mentioning, hallucination, contradiction, action alignment) with a CoT intervention protocol to assess CoT-action relationships in VLA models.
In practice
- Use CoT intervention to test causal influence.
- Evaluate CoT for relevance, consistency, and causality.
Topics
- VLADriveBench
- Vision-Language-Action Models
- Chain-of-Thought Reasoning
- Autonomous Driving
- Model Evaluation
- Causal Inference
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.