ProvenAI: Provenance-Native Traces of Evidence in Generated Answers
Summary
ProvenAI is a novel framework designed to enhance transparency in multi-hop question answering systems by decomposing it into three independently measurable layers: answer correctness, citation fidelity against benchmark supporting evidence, and per-document influence determined by leave-one-resource-out intervention. Targeting the HotpotQA distractor benchmark, ProvenAI employs a seven-stage pipeline encompassing data normalisation, retrieval indexing, citation-aware answer generation, attribution auditing, ablation-based influence estimation, batch evaluation, and interactive inspection. The framework was evaluated using 7,405 validation examples from a corpus of 509,300 passages, achieving 53.53% answer accuracy and a mean citation-fidelity score of 71.55%. A key finding is the "citation-influence gap," where a clean citation audit can coexist with weak influence from cited sources and strong influence from uncited ones, underscoring the need for traceable links across retrieved, cited, and behaviourally influential evidence.
Key takeaway
For NLP Engineers developing retrieval-augmented generation (RAG) systems, you should move beyond simple citation checks to truly understand source influence. Implement a multi-layered transparency evaluation, incorporating answer correctness, citation fidelity, and per-document influence measurement. This approach will help you identify and mitigate the "citation-influence gap," ensuring that your system's outputs are genuinely grounded in their cited evidence, thereby improving trustworthiness and factual accuracy.
Key insights
Meaningful transparency in retrieval-grounded QA requires independently measuring correctness, citation fidelity, and source influence.
Principles
- Citations alone do not confirm source influence.
- Decompose transparency into distinct, measurable layers.
- Causal-mediation analysis grounds provenance theory.
Method
ProvenAI uses a seven-stage pipeline: data normalisation, retrieval indexing, citation-aware generation, attribution auditing, ablation-based influence, batch evaluation, and interactive inspection.
In practice
- Audit citation fidelity against benchmark evidence.
- Estimate source influence via leave-one-resource-out.
- Identify "citation-influence gaps" in QA outputs.
Topics
- ProvenAI
- Retrieval-Augmented Generation
- Multi-hop Question Answering
- Citation Fidelity
- Source Influence
- Transparency Metrics
Best for: Research Scientist, AI Architect, AI Engineer, AI Scientist, NLP Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.