Look Again Before You Abstain:Budgeted Conformal Evidence Acquisition for Reliable Vision-Language Model

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

Large Vision-Language Models (LVLMs) frequently hallucinate, asserting visual details not supported by images. Existing selective prediction methods, while offering distribution-free guarantees on hallucination rates, achieve this at a high cost, requiring abstention on over 80% of claims to maintain a hallucination rate below 5% on object-existence benchmarks. To mitigate this waste, Budgeted Conformal Evidence Acquisition (BCEA) introduces a three-way decision: answer, abstain, or acquire additional visual evidence through re-examination (zooming, cropping, claim-specific interventions) under a bounded compute budget. A critical observation is that naive evidence acquisition breaks conformal calibration's statistical guarantees, causing realized risk to overshoot the target by up to 17 points. BCEA addresses this by folding the entire acquisition policy into the score function and re-calibrating, which restores finite-sample guarantees and improves coverage. Tested on POPE and COCO benchmarks with four open VLMs, BCEA effectively controls hallucination rates and consistently enhances coverage over guaranteed-abstention baselines.

Key takeaway

For Machine Learning Engineers deploying Large Vision-Language Models, if you are struggling with high abstention rates while maintaining hallucination guarantees, consider implementing Budgeted Conformal Evidence Acquisition (BCEA). This approach allows your models to acquire additional visual evidence, such as zooming or cropping, within a compute budget, significantly improving coverage without sacrificing statistical reliability. You should integrate the acquisition policy directly into your score function and recalibrate to restore finite-sample guarantees, enhancing model utility in real-world applications.

Key insights

BCEA improves LVLM reliability by acquiring more visual evidence under budget, restoring statistical guarantees through recalibration.

Principles

Method

BCEA replaces binary answer/abstain with a three-way choice: answer, abstain, or acquire additional visual evidence (zoom, crop, claim-specific intervention) within a budget, then recalibrates post-acquisition scores.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.