A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models
Summary
A new calibrated per-sample metric, the Memorization Index (MI), has been developed to detect training data leakage and duplication in generative MRI models. This metric addresses privacy concerns arising from image generative models duplicating training data in their outputs, particularly critical in medical imaging. The MI utilizes image features extracted by an MRI foundation model, aggregates multi-layer whitened nearest-neighbor similarities, and maps these to bounded Overfit/Novelty Index (ONI) and MI scores. Evaluated across three distinct MRI datasets with controlled duplication rates and standard image augmentations, the MI demonstrated robust duplication detection and maintained consistent metric values across different datasets. At the individual sample level, the metric achieved near-perfect accuracy in identifying duplicates.
Key takeaway
For research scientists and computer vision engineers developing or deploying generative MRI models, understanding and mitigating training data leakage is crucial for patient privacy. The Calibrated Memorization Index (MI) offers a robust tool for detecting data duplication, achieving near-perfect accuracy at the sample level. You should integrate this metric into your model evaluation pipelines to ensure data privacy and model integrity, especially when working with sensitive medical imaging datasets.
Key insights
A new metric effectively detects training data duplication in generative MRI models, enhancing privacy.
Principles
- Feature-based similarity detects data duplication.
- Calibrated metrics improve cross-dataset consistency.
Method
The method extracts MRI features, aggregates multi-layer whitened nearest-neighbor similarities, and maps them to bounded Overfit/Novelty Index (ONI) and Memorization Index (MI) scores.
In practice
- Apply MI to audit generative medical image models.
- Use ONI scores to assess model novelty.
Topics
- Generative MRI Models
- Training Data Leakage
- Memorization Detection
- Medical Image Privacy
- MRI Foundation Models
Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.