Vision-Language Models vs Human: Perceptual Image Quality Assessment
Summary
A study investigates the capability of Vision-Language Models (VLMs) to approximate human perceptual judgments in image quality assessment (IQA) across three scales: contrast, colorfulness, and overall preference. Six VLMs, comprising four proprietary and two open-weight models, were systematically benchmarked against psychophysical data. The findings indicate significant attribute-dependent variability, with models excelling in colorfulness (Pearson's ρ up to 0.93) often underperforming in contrast, and vice-versa. Analysis of attribute weighting shows VLMs prioritize colorfulness over contrast for overall preference, mirroring human psychophysical data. Additionally, intramodel consistency analysis reveals a tradeoff where highly self-consistent models are not always the most human-aligned, suggesting that response variability might reflect sensitivity to scene-dependent perceptual cues. Human-VLM agreement improves with increased perceptual separability, indicating better reliability when stimulus differences are distinct.
Key takeaway
For research scientists developing or deploying automated image quality assessment systems, you should carefully consider the specific perceptual attributes your VLM needs to evaluate. Do not assume high performance on one attribute translates to others, and be aware that a VLM's internal consistency does not guarantee human alignment. Focus on scenarios with clear perceptual differences to maximize VLM reliability.
Key insights
VLMs can approximate human image quality judgments, but performance varies significantly by perceptual attribute.
Principles
- IQA alignment varies by attribute.
- Self-consistency does not equal human alignment.
- Perceptual separability improves VLM reliability.
Method
VLMs were benchmarked against human psychophysical data across contrast, colorfulness, and overall preference scales, analyzing attribute weighting and intramodel consistency.
In practice
- Prioritize VLMs based on specific IQA attribute.
- Test VLM reliability with distinct stimulus differences.
Topics
- Vision-Language Models
- Image Quality Assessment
- Human Perception
- Psychophysical Experiments
- Perceptual Metrics
Best for: Research Scientist, AI Researcher, AI Scientist, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.