Revealing Interpretable Failure Modes of VLMs

2026-03-07 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Revelio is a novel framework designed to systematically uncover interpretable failure modes in Vision-Language Models (VLMs), particularly for safety-critical applications like autonomous driving and indoor robotics. It defines a failure mode as a consistent incorrect VLM behavior under a composition of domain-relevant concepts, such as "pedestrian proximity" or "adverse weather conditions." To navigate the exponentially large combinatorial space of these concepts, Revelio employs two search strategies: a diversity-aware beam search for efficient mapping of the failure landscape and a Gaussian Process-based Thompson Sampling for broader exploration. Applied to autonomous driving and indoor robotics, Revelio identified previously unreported vulnerabilities in state-of-the-art VLMs, including weak spatial grounding, failure to account for major obstructions, and excessive conservatism. The framework discovered 3-5x more failure modes than unguided random search within the same budget, providing actionable insights for VLM safety improvements.

Key takeaway

For AI Engineers and Research Scientists developing or deploying VLMs in safety-critical systems, Revelio offers a robust framework to proactively identify and understand systemic vulnerabilities. You should consider integrating Revelio's concept-based evaluation to move beyond anecdotal failures, pinpointing specific conditions that trigger consistent VLM errors. This approach provides concrete, interpretable insights, enabling targeted model improvements and guiding more effective safety alignment before deployment.

Key insights

Revelio systematically uncovers interpretable VLM failure modes by searching concept compositions in safety-critical domains.

Principles

Failure modes are compositions of interpretable, domain-relevant concepts.
Active search for vulnerabilities must combine discovery with semantic realism.
Combining concepts can non-linearly change VLM failure rates.

Method

Revelio uses a diversity-aware beam search and Gaussian Process-based Thompson Sampling to explore concept combinations, generating scenarios via simulators like CARLA or image generation models, and evaluating VLM responses against rule-based ground truth.

In practice

Use Revelio to identify VLM vulnerabilities in autonomous driving.
Apply Revelio for hazard detection in indoor robotics scenarios.
Encode concept sets as multi-hot binary vectors for GP modeling.

Topics

Vision-Language Model Safety
Interpretable Failure Modes
Revelio Framework
Autonomous Driving
Indoor Robotics

Code references

uiuc-focal/Revelio

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.