A Calibrated Memorization Index (MI) for Detecting Training Data Leakage in Generative MRI Models

2026-02-13 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition · Depth: Advanced, quick

Summary

A new calibrated per-sample metric, the Memorization Index (MI), has been developed to detect training data leakage and duplication in generative MRI models. This metric addresses privacy concerns arising from image generative models duplicating training data in their outputs, particularly critical in medical imaging. The MI utilizes image features extracted by an MRI foundation model, aggregates multi-layer whitened nearest-neighbor similarities, and maps these to bounded Overfit/Novelty Index (ONI) and MI scores. Evaluated across three distinct MRI datasets with controlled duplication rates and standard image augmentations, the MI demonstrated robust duplication detection and maintained consistent metric values across different datasets. At the individual sample level, the metric achieved near-perfect accuracy in identifying duplicates.

Key takeaway

For research scientists and computer vision engineers developing or deploying generative MRI models, understanding and mitigating training data leakage is crucial for patient privacy. The Calibrated Memorization Index (MI) offers a robust tool for detecting data duplication, achieving near-perfect accuracy at the sample level. You should integrate this metric into your model evaluation pipelines to ensure data privacy and model integrity, especially when working with sensitive medical imaging datasets.

Key insights

A new metric effectively detects training data duplication in generative MRI models, enhancing privacy.

Principles

Feature-based similarity detects data duplication.
Calibrated metrics improve cross-dataset consistency.

Method

The method extracts MRI features, aggregates multi-layer whitened nearest-neighbor similarities, and maps them to bounded Overfit/Novelty Index (ONI) and Memorization Index (MI) scores.

In practice

Apply MI to audit generative medical image models.
Use ONI scores to assess model novelty.

Topics

Generative MRI Models
Training Data Leakage
Memorization Detection
Medical Image Privacy
MRI Foundation Models

Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.