CuraView: A Multi-Agent Framework for Medical Hallucination Detection with GraphRAG-Enhanced Knowledge Verification

· Source: Takara TLDR - Daily AI Papers · Field: Health & Wellbeing — Clinical Care & Medical Practice, Medical Devices & Health Technology, Health & Medical Research · Depth: Expert, medium

Summary

CuraView is a multi-agent framework designed to detect and explain faithfulness hallucinations in discharge summaries, which are critical documents derived from electronic health records (EHRs). These hallucinations, where LLMs generate statements contradicting source records, pose significant patient safety risks. CuraView addresses this by constructing a GraphRAG-based knowledge graph from patient EHRs and implementing a closed-loop generation-detection pipeline. This pipeline performs sentence-level evidence retrieval and classifies evidence into four grades (E1-E4), ranging from strong support to direct contradiction, providing structured and interpretable evidence chains. Evaluated on a 250-patient subset of the Discharge-Me benchmark, CuraView's fine-tuned Qwen3-14B detection model achieved an F1 score of 0.831 for safety-critical E4 contradictions (90.9% recall, 76.5% precision) and 0.823 for E3+E4, marking a 50.0% relative improvement over the base model and outperforming RAGTruth-style and QAGS-style baselines.

Key takeaway

For MLOps Engineers deploying LLMs in clinical settings, CuraView demonstrates a robust method for mitigating faithfulness hallucinations in discharge summaries. Your LLM-generated clinical documentation can achieve higher factual reliability by integrating a GraphRAG-enhanced, evidence-chain-based verification framework. Consider adopting a similar multi-agent, closed-loop detection pipeline to improve patient safety and generate reusable annotated datasets for future model training.

Key insights

CuraView uses GraphRAG and multi-agent verification to detect and explain medical LLM hallucinations, improving factual reliability.

Principles

Method

CuraView builds a GraphRAG knowledge graph from EHRs, then uses a closed-loop generation-detection pipeline with sentence-level evidence retrieval and classification into four grades (E1-E4) to identify contradictions.

In practice

Topics

Best for: NLP Engineer, AI Scientist, Research Scientist, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.