MACD: Model-Aware Contrastive Decoding via Counterfactual Data

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

Model-aware Contrastive Decoding (MACD) is a novel inference strategy designed to combat hallucinations in Video Large Language Models (Video-LLMs). Video-LLMs, such as those from the Qwen and InternVL families, often generate ungrounded content when visual evidence is weak or ambiguous. Unlike traditional contrastive decoding (CD) methods that rely on random perturbations, MACD leverages the Video-LLM's own feedback to identify specific object regions responsible for hallucination. It then generates targeted, object-level counterfactual inputs, which are integrated into the CD process to enforce evidence-grounded token selection. Experiments on EventHallusion, MVBench, Perception-test, and Video-MME benchmarks demonstrate that MACD consistently reduces hallucination while maintaining or improving task accuracy, proving particularly effective for challenging scenarios involving small, occluded, or co-occurring objects.

Key takeaway

For AI Scientists or ML Engineers deploying Video-LLMs, if you are encountering issues with model hallucination, MACD offers a robust inference-time solution. You should consider integrating this method to significantly enhance factual accuracy and reliability, especially when dealing with ambiguous visual evidence or complex object interactions. MACD improves output grounding without requiring model retraining, making it a practical approach for immediate deployment.

Key insights

MACD uses model feedback to create targeted counterfactual data, improving contrastive decoding for Video-LLM hallucination.

Principles

Method

Identify objects, assign soft masks, compute Video-LLM prediction loss, use gradients to update masks for counterfactual video, then perform contrastive decoding with original and counterfactual inputs.

In practice

Topics

Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.