Does RAG Know When Retrieval Is Wrong? Diagnosing Context Compliance under Knowledge Conflict
Summary
A new study introduces Context-Driven Decomposition (CDD), an inference-time belief-decomposition probe designed to diagnose how Retrieval-Augmented Generation (RAG) models handle knowledge conflicts between retrieved context and their parametric knowledge. CDD was evaluated across Epi-Scale stress tests, TruthfulQA misconception injection, and cross-model reruns, revealing three key patterns. Standard RAG achieved only 15.0% accuracy on TruthfulQA misconception injection (N=500) in an adversarial setting, indicating low context compliance. CDD significantly improved accuracy across model families, including Gemini-2.5-Flash and Claude Haiku/Sonnet/Opus, though the causal coupling between rationale and answer did not transfer consistently. Furthermore, CDD enhanced robustness against temporal drift and noisy distractors, achieving 71.3% on temporal shifts and 69.9% on distractor evidence on the Epi-Scale benchmark.
Key takeaway
For AI Architects and Research Scientists evaluating RAG system robustness, this research highlights a critical vulnerability in how RAG handles conflicting information. You should integrate conflict-resolution diagnostics like Context-Driven Decomposition (CDD) into your evaluation pipelines to measure and improve context compliance. This approach offers a distinct path to enhancing reliability beyond just retrieval quality, especially when deploying RAG in dynamic or adversarial environments.
Key insights
Context-Driven Decomposition (CDD) measures and improves RAG's ability to resolve knowledge conflicts between retrieved context and parametric knowledge.
Principles
- Context compliance is a measurable RAG vulnerability.
- Adversarial accuracy gains can transfer across models.
- Explicit conflict decomposition improves RAG robustness.
Method
Context-Driven Decomposition (CDD) is an inference-time belief-decomposition probe that intervenes on controlled retrieval conflict to diagnose context compliance in RAG systems.
In practice
- Use CDD to diagnose RAG context compliance.
- Apply CDD for improved robustness to drift.
- Test RAG systems with Epi-Scale benchmark.
Topics
- Retrieval-Augmented Generation
- Context Compliance
- Context-Driven Decomposition
- Knowledge Conflict
- Epi-Scale Benchmark
Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.