Lost at the End: Primacy Bias in Multimodal Retrieval-Augmented Question Answering

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study on Multimodal Retrieval-Augmented Question Answering (KB-VQA) reveals a "Lost at the End" primacy bias, where vision-language models prioritize information at the beginning of retrieved contexts. Contrary to the U-shaped "lost-in-the-middle" effect observed in pure-text LLMs, a controlled "gold-position protocol" applied to three open-source 7B/8B VLM readers and two KB-VQA benchmarks demonstrated that gold passages placed first outperformed those placed last by 16 to 26 points. Ablations indicated the multimodal setting amplifies an existing text-mode primacy by 2.2 to 4.5 times, localizing the issue to prompt slot 0 of instruction-tuned readers. Standard retrieval-side fixes like MMR, oracle reranking, and rank-based reordering failed to mitigate this gap. The findings suggest "recall@k" is an inadequate metric for deployed KB-VQA, necessitating reader-side interventions. The research releases its protocol as an evaluation instrument.

Key takeaway

For Machine Learning Engineers deploying multimodal Retrieval-Augmented Question Answering systems, you must account for the significant "Lost at the End" primacy bias. Your systems will heavily favor information presented early in the retrieved context, potentially overlooking crucial details at the end. You should prioritize reader-side interventions to mitigate this positional dependence and re-evaluate "recall@k" as a primary performance metric, as it may not accurately reflect real-world effectiveness.

Key insights

Multimodal KB-VQA suffers from a "Lost at the End" primacy bias, where initial context is heavily favored, demanding reader-side interventions.

Principles

Method

The "gold-position protocol" systematically varies the gold passage's prompt slot. Targeted ablations, including text-only control and image/distractor shuffling, pinpoint the bias locus to prompt slot 0 in instruction-tuned readers.

In practice

Topics

Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.