When Prompts Override Vision: Prompt-Induced Hallucinations in LVLMs

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision, Natural Language Processing · Depth: Expert, quick

Summary

A new benchmark called HalluScope has been introduced to investigate prompt-induced hallucinations in large vision-language models (LVLMs). This research identifies that LVLM hallucinations primarily arise from an over-reliance on textual priors and background knowledge, particularly when such information is embedded within textual instructions. To address this, the authors propose HalluVL-DPO, a fine-tuning framework that uses preference optimization. HalluVL-DPO guides LVLMs to generate more visually grounded responses by leveraging a specially curated training dataset. The optimized model effectively reduces targeted hallucination failures while maintaining or enhancing performance on existing hallucination benchmarks and visual capability assessments. The benchmark, training dataset, and code will be publicly released to foster further research.

Key takeaway

For AI Engineers and Research Scientists developing or deploying LVLMs, understanding that textual prompt priors significantly induce hallucinations is critical. You should consider integrating HalluVL-DPO or similar preference optimization techniques into your fine-tuning workflows to mitigate these prompt-induced hallucinations, ensuring your models produce more visually accurate and reliable outputs. Evaluate your models using benchmarks like HalluScope to specifically identify and address these failure modes.

Key insights

LVLM hallucinations are largely driven by over-reliance on textual instruction priors, not just vision backbone limits.

Principles

Method

HalluVL-DPO fine-tunes LVLMs using preference optimization on a curated dataset, guiding models to prefer visually grounded responses over hallucinated ones.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.