Hallucination in Medical Imaging AI: A Cross-Modality Analytical Framework for Taxonomy, Detection, and Mitigation under Regulatory Constraints

· Source: cs.AI updates on arXiv.org · Field: Health & Wellbeing — Medical Devices & Health Technology, Healthcare Systems & Policy, Clinical Care & Medical Practice · Depth: Expert, extended

Summary

A cross-modality analytical framework analyzes hallucination in medical imaging AI, synthesizing peer-reviewed studies, benchmark datasets, and FDA regulatory guidance (2023-2026) across five modalities: CT, MRI, PET/SPECT, ultrasound, and digital pathology. The study reveals that no single taxonomic framework is sufficient, requiring integration of three distinct approaches. Counterintuitively, general-purpose foundation models outperform medical-specialized models on hallucination-specific benchmarks, achieving a median hallucination-free rate of 76.6% versus 51.3% (p=0.012). Effective mitigation strategies include physics-informed architectural constraints, Chain-of-Thought prompting (reducing hallucinations by up to 86.4%), and human-in-the-loop safeguards, all mapped to the FDA's Total Product Lifecycle (TPLC) and Predetermined Change Control Plan (PCCP) frameworks, which treat hallucination management as a continuous lifecycle obligation.

Key takeaway

For AI/ML Directors and Research Scientists deploying medical imaging AI, you should prioritize hallucination-specific benchmarks over general accuracy metrics when evaluating models, as general-purpose models surprisingly show lower hallucination rates (76.6% vs. 51.3%). Integrate Chain-of-Thought prompting and robust human-in-the-loop oversight, which reduces false positives by 83.7%, ensuring compliance with FDA's TPLC and PCCP frameworks for continuous lifecycle management.

Key insights

General-purpose AI models surprisingly exhibit lower hallucination rates than specialized medical models on specific benchmarks.

Principles

Method

A structured narrative design synthesized peer-reviewed studies, benchmark datasets (Med-HallMark, MedHallBench), and FDA guidance (Jan 2025, Dec 2024) across five imaging modalities to analyze taxonomy, etiology, detection, and mitigation.

In practice

Topics

Best for: CTO, VP of Engineering/Data, AI Architect, Research Scientist, AI Scientist, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.