FIDES: Faithful Inference via Deep Evidence Signals for Retrieval-Memory Conflict in RAG
Summary
FIDES (Faithful Inference via Deep Evidence Signals) is a training-free decoder designed to address retrieval-memory conflict in Retrieval-Augmented Generation (RAG). It tackles the issue where language models frequently ignore retrieved context that contradicts parametric memory by applying token-level contrastive pressure. Unlike existing methods that assume uniform bias, FIDES identifies heterogeneous conflict concentration on a small fraction of answer-critical tokens. It fuses three internal signals—Opposition (output surface), Shift (hidden representations), and Noise (prediction trajectory)—to dynamically govern intervention strength at each decoding step. Across three benchmarks and six backbones (7B to 70B), FIDES achieved the best context fidelity in all 18 settings, outperforming the strongest training-free baseline (AdaCAD) by +3 to +13 points. On LLaMA3-70B, fidelity reached 92–94% and F1 surged to 62–63%, with only +8%–+11% overhead over CAD.
Key takeaway
For AI Scientists and ML Engineers developing RAG systems, FIDES offers a robust, training-free method to significantly enhance context fidelity, especially when retrieved evidence contradicts model memory. By dynamically adjusting contrastive pressure at the token level, you can prevent stubborn hallucinations and improve overall generation quality, even with large models like LLaMA3-70B. Consider integrating FIDES to ensure your RAG applications consistently follow provided context, particularly in high-stakes factual domains.
Key insights
Retrieval-memory conflict in RAG is token-level, requiring dynamic, deep-signal-driven contrastive decoding for faithfulness.
Principles
- Parametric bias is heterogeneous across tokens.
- Deep internal signals reveal conflict intensity.
- Token-level contrast improves fidelity and utility.
Method
FIDES runs dual context/no-context passes, extracts Opposition (JSD), Shift (L2 distance of hidden states), and Noise (KL divergence of intermediate logits), fuses them, and maps to a token-specific contrastive coefficient α_t.
In practice
- Implement dual-path decoding for conflict detection.
- Monitor JSD, L2 distance, and KL divergence for token-level risk.
- Apply adaptive contrastive weights based on signal fusion.
Topics
- Retrieval-Augmented Generation
- Contrastive Decoding
- Hallucination Mitigation
- Context Fidelity
- Large Language Models
- Token-level Control
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.