Patient privacy in the age of clinical AI: Scientists investigate memorization risk
Summary
MIT researchers, in a paper presented at NeurIPS 2025 and posted to the arXiv preprint server, investigated how artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information. This "memorization" occurs when a model draws upon a singular patient record for its output, potentially violating privacy, rather than generalizing knowledge from many records. The study developed a rigorous testing setup to evaluate data leakage in a health care context, emphasizing that the risk of harm depends on the attacker's prior knowledge and the sensitivity of the leaked information. Findings indicate that more attacker information increases leakage likelihood, and some leaks, like an HIV diagnosis, are more harmful than demographic data. Patients with unique conditions are particularly vulnerable.
Key takeaway
For CTOs and VPs of Engineering overseeing AI deployments in healthcare, you must implement rigorous testing protocols to detect and mitigate patient-specific data memorization in foundation models. Your evaluation should differentiate between benign and harmful data leaks, prioritizing enhanced protections for sensitive conditions and unique patient profiles, especially as interdisciplinary legal and clinical expertise becomes integrated into model development.
Key insights
AI models trained on de-identified EHRs can memorize and leak patient data, necessitating rigorous privacy evaluation.
Principles
- Memorization differs from generalization.
- Leakage risk scales with attacker knowledge.
- Harm depends on data sensitivity.
Method
The research team developed a series of tests to measure various types of uncertainty and assess practical risk by evaluating different tiers of attack possibility against EHR foundation models.
In practice
- Distinguish generalization from memorization.
- Prioritize protection for unique patient conditions.
- Evaluate leakage in health care context.
Topics
- AI Memorization
- Patient Privacy
- Electronic Health Records
- Foundation Models
- Healthcare AI
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Researcher, AI Ethicist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by News on Artificial Intelligence and Machine Learning.