When Vision-Language Models Judge Without Seeing: Exposing Informativeness Bias

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Advanced, medium

Summary

A new analysis reveals that Vision-Language Models (VLMs) used as judges for automatic evaluation often exhibit an "informativeness bias," where they prioritize more informative answers over image-grounded correctness. This occurs even when the VLM recognizes a conflict with the image content, significantly compromising judge reliability. To counter this, researchers propose BIRCH (Balanced Informativeness and CoRrectness with a Truthful AnCHor), a judging paradigm that first corrects image inconsistencies in candidate answers before comparing them against this corrected version. This method redirects the VLM's focus to image-grounded correctness. Experiments across multiple models and benchmarks demonstrate that BIRCH reduces informativeness bias by up to 17% and yields performance gains of up to 9.8%, highlighting a critical flaw in current VLM-as-a-Judge systems.

Key takeaway

For AI Engineers evaluating VLM performance, recognizing and mitigating "informativeness bias" is crucial. Implement the BIRCH paradigm to ensure your VLM-as-a-Judge systems prioritize image-grounded correctness over mere informativeness, thereby improving evaluation reliability and achieving performance gains of up to 9.8%. This shift will lead to more accurate model comparisons and accelerate scientific progress.

Key insights

VLM-as-a-Judge systems often prioritize informative answers over image-grounded correctness, leading to "informativeness bias."

Principles

Method

BIRCH corrects image inconsistencies in candidate answers, then compares them against the corrected version to shift judge focus to image-grounded correctness.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.