Vision-Language Models vs Human: Perceptual Image Quality Assessment

2026-03-25 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, quick

Summary

A study investigates the capability of Vision-Language Models (VLMs) to approximate human perceptual judgments in image quality assessment (IQA) across three scales: contrast, colorfulness, and overall preference. Six VLMs, comprising four proprietary and two open-weight models, were systematically benchmarked against psychophysical data. The findings indicate significant attribute-dependent variability, with models excelling in colorfulness (Pearson's ρ up to 0.93) often underperforming in contrast, and vice-versa. Analysis of attribute weighting shows VLMs prioritize colorfulness over contrast for overall preference, mirroring human psychophysical data. Additionally, intramodel consistency analysis reveals a tradeoff where highly self-consistent models are not always the most human-aligned, suggesting that response variability might reflect sensitivity to scene-dependent perceptual cues. Human-VLM agreement improves with increased perceptual separability, indicating better reliability when stimulus differences are distinct.

Key takeaway

For research scientists developing or deploying automated image quality assessment systems, you should carefully consider the specific perceptual attributes your VLM needs to evaluate. Do not assume high performance on one attribute translates to others, and be aware that a VLM's internal consistency does not guarantee human alignment. Focus on scenarios with clear perceptual differences to maximize VLM reliability.

Key insights

VLMs can approximate human image quality judgments, but performance varies significantly by perceptual attribute.

Principles

IQA alignment varies by attribute.
Self-consistency does not equal human alignment.
Perceptual separability improves VLM reliability.

Method

VLMs were benchmarked against human psychophysical data across contrast, colorfulness, and overall preference scales, analyzing attribute weighting and intramodel consistency.

In practice

Prioritize VLMs based on specific IQA attribute.
Test VLM reliability with distinct stimulus differences.

Topics

Vision-Language Models
Image Quality Assessment
Human Perception
Psychophysical Experiments
Perceptual Metrics

Best for: Research Scientist, AI Researcher, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.