When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias
Summary
A new analysis reveals that Vision-Language Models (VLMs) used as judges for automatic evaluation often exhibit an "informativeness bias," where they prioritize more informative answers over image-grounded correctness. This occurs even when the VLM recognizes a conflict with the image content, significantly compromising judge reliability. To counter this, researchers propose BIRCH (Balanced Informativeness and CoRrectness with a Truthful AnCHor), a judging paradigm that first corrects image inconsistencies in candidate answers before comparing them against this corrected version. This method redirects the VLM's focus to image-grounded correctness. Experiments across multiple models and benchmarks demonstrate that BIRCH reduces informativeness bias by up to 17% and yields performance gains of up to 9.8%, highlighting a critical flaw in current VLM-as-a-Judge systems.
Key takeaway
For AI Engineers evaluating VLM performance, recognizing and mitigating "informativeness bias" is crucial. Implement the BIRCH paradigm to ensure your VLM-as-a-Judge systems prioritize image-grounded correctness over mere informativeness, thereby improving evaluation reliability and achieving performance gains of up to 9.8%. This shift will lead to more accurate model comparisons and accelerate scientific progress.
Key insights
VLM-as-a-Judge systems often prioritize informative answers over image-grounded correctness, leading to "informativeness bias."
Principles
- VLM judges can ignore image content.
- Bias undermines VLM evaluation reliability.
Method
BIRCH corrects image inconsistencies in candidate answers, then compares them against the corrected version to shift judge focus to image-grounded correctness.
In practice
- Implement BIRCH for VLM evaluation.
- Prioritize image-grounded correctness.
Topics
- Vision-Language Models
- VLM-as-a-Judge
- Informativeness Bias
- BIRCH Paradigm
- Model Evaluation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.