Mechanisms of Prompt-Induced Hallucination in Vision-Language Models

2026-04-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Vision-Language Models, Model Interpretability · Depth: Expert, extended

Summary

A study by Rudman et al. investigates prompt-induced hallucinations (PIH) in large vision-language models (VLMs), where models prioritize textual prompts over conflicting visual evidence. Using a controlled object-counting task with misaligned prompts (e.g., asking for four waterlilies when only three are present), the researchers found that VLMs often correct overestimations at low object counts but increasingly conform to the prompt as the number of objects increases, even with large discrepancies. Through mechanistic analysis of three VLMs (LLaVA-OneVision-7B, Qwen2-VL-7B, and Janus-Pro-7B), the study identified a small set of attention heads, termed PIH-heads, whose ablation significantly reduces hallucinations by at least 40% without additional training. These PIH-heads, primarily located in early language model layers, mediate prompt copying and, when ablated, increase reliance on visual evidence, generalizing to tasks beyond counting, such as color identification, with up to a 94.25% reduction in prompt-color copying.

Key takeaway

For AI Engineers and Research Scientists developing or deploying VLMs, understanding and mitigating prompt-induced hallucinations is critical for reliability. You should consider implementing targeted attention head ablations, particularly in early language model layers, to reduce text-over-vision bias. This approach can significantly improve visual grounding and factual accuracy without requiring extensive retraining, enhancing model robustness in real-world applications with potentially noisy inputs.

Key insights

VLMs hallucinate by prioritizing text over vision, a behavior traceable to specific, ablatable attention heads.

Principles

PIH increases with object count.
PIH-heads are concentrated in early LM layers.
Ablation shifts reliance to visual evidence.

Method

Identify PIH-mediating attention heads via mean ablation, then ablate these heads to reduce prompt-induced hallucinations and enhance visual grounding.

In practice

Ablate PIH-heads to reduce VLM hallucinations.
Focus on early LM layers for intervention.
Test ablation effects across diverse tasks.

Topics

Vision-Language Models
Prompt-Induced Hallucinations
Attention Head Ablation
Mechanistic Interpretability
Object Counting Task

Code references

michalg04/prompt-induced_hallucinations

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.