CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification
Summary
CuraView is a multi-agent framework designed to detect and explain faithfulness hallucinations in discharge summaries, which are critical documents derived from electronic health records (EHRs). These hallucinations, where LLMs generate statements contradicting source records, pose significant patient safety risks. CuraView addresses this by constructing a GraphRAG-based knowledge graph from patient EHRs and implementing a closed-loop generation-detection pipeline. This pipeline performs sentence-level evidence retrieval and classifies evidence into four grades (E1-E4), ranging from strong support to direct contradiction, providing structured and interpretable evidence chains. Evaluated on a 250-patient subset of the Discharge-Me benchmark, CuraView's fine-tuned Qwen3-14B detection model achieved an F1 score of 0.831 for safety-critical E4 contradictions (90.9% recall, 76.5% precision) and 0.823 for E3+E4, marking a 50.0% relative improvement over the base model and outperforming RAGTruth-style and QAGS-style baselines.
Key takeaway
For MLOps Engineers deploying LLMs in clinical settings, CuraView demonstrates a robust method for mitigating faithfulness hallucinations in discharge summaries. Your LLM-generated clinical documentation can achieve higher factual reliability by integrating a GraphRAG-enhanced, evidence-chain-based verification framework. Consider adopting a similar multi-agent, closed-loop detection pipeline to improve patient safety and generate reusable annotated datasets for future model training.
Key insights
CuraView uses GraphRAG and multi-agent verification to detect and explain medical LLM hallucinations, improving factual reliability.
Principles
- Evidence-chain verification improves factual reliability.
- GraphRAG enhances knowledge verification in LLMs.
Method
CuraView builds a GraphRAG knowledge graph from EHRs, then uses a closed-loop generation-detection pipeline with sentence-level evidence retrieval and classification into four grades (E1-E4) to identify contradictions.
In practice
- Use Qwen3-14B for medical hallucination detection.
- Implement GraphRAG for EHR knowledge verification.
Topics
- Medical Hallucination Detection
- Multi-Agent Frameworks
- GraphRAG
- Electronic Health Records
- Clinical Documentation
Best for: NLP Engineer, AI Scientist, Research Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.