P\textsuperscript{2}-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization
Summary
P²-DPO (Perceptual Processing Direct Preference Optimization) is a new training paradigm designed to mitigate hallucination in Large Vision-Language Models (LVLMs). It addresses limitations of existing Direct Preference Optimization (DPO) methods, which often overlook perceptual bottlenecks in attended regions and lack Visual Robustness against image degradation. P²-DPO uniquely generates and learns from its own on-policy preference pairs, thereby avoiding vision-agnostic and off-policy data issues. The method incorporates an on-policy preference pair construction targeting "Focus-and-Enhance" perception and Visual Robustness, alongside a Calibration Loss to precisely align visual signals with text generation. Experimental results show P²-DPO outperforms strong baselines that use costly human feedback, achieving this with comparable training data and cost. Evaluations on Attention Region Fidelity (ARF) and image degradation scenarios confirm its effectiveness in improving perceptual processing and robustness.
Key takeaway
For AI Scientists and Machine Learning Engineers developing Large Vision-Language Models, P²-DPO presents a compelling strategy to combat hallucination and enhance visual robustness. You should consider its novel on-policy preference pair construction and Calibration Loss as a more efficient alternative to costly human feedback. This approach directly targets perceptual bottlenecks and improves performance in degraded image scenarios, offering a path to more reliable and robust LVLM deployments.
Key insights
P²-DPO reduces LVLM hallucination by self-generating on-policy visual preference pairs and using a calibration loss.
Principles
- Hallucination in LVLMs stems from perceptual bottlenecks.
- On-policy preference learning improves model guidance.
- Visual robustness is critical for LVLM reliability.
Method
P²-DPO constructs on-policy preference pairs targeting "Focus-and-Enhance" perception and Visual Robustness, then applies a Calibration Loss to align visual signals with text generation.
In practice
- Improve LVLM performance on ARF benchmarks.
- Enhance model robustness to degraded images.
- Reduce reliance on expensive human feedback.
Topics
- Large Vision-Language Models
- Hallucination Mitigation
- Direct Preference Optimization
- Visual Robustness
- On-policy Learning
- Computer Vision
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.