Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study introduces Context-Driven Decomposition (CDD), an inference-time belief-decomposition probe designed to diagnose how Retrieval-Augmented Generation (RAG) models handle knowledge conflicts between retrieved context and their parametric knowledge. CDD was evaluated across Epi-Scale stress tests, TruthfulQA misconception injection, and cross-model reruns, revealing three key patterns. Standard RAG achieved only 15.0% accuracy on TruthfulQA misconception injection (N=500) in an adversarial setting, indicating low context compliance. CDD significantly improved accuracy across model families, including Gemini-2.5-Flash and Claude Haiku/Sonnet/Opus, though the causal coupling between rationale and answer did not transfer consistently. Furthermore, CDD enhanced robustness against temporal drift and noisy distractors, achieving 71.3% on temporal shifts and 69.9% on distractor evidence on the Epi-Scale benchmark.

Key takeaway

For AI Architects and Research Scientists evaluating RAG system robustness, this research highlights a critical vulnerability in how RAG handles conflicting information. You should integrate conflict-resolution diagnostics like Context-Driven Decomposition (CDD) into your evaluation pipelines to measure and improve context compliance. This approach offers a distinct path to enhancing reliability beyond just retrieval quality, especially when deploying RAG in dynamic or adversarial environments.

Key insights

Context-Driven Decomposition (CDD) measures and improves RAG's ability to resolve knowledge conflicts between retrieved context and parametric knowledge.

Principles

Method

Context-Driven Decomposition (CDD) is an inference-time belief-decomposition probe that intervenes on controlled retrieval conflict to diagnose context compliance in RAG systems.

In practice

Topics

Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.