Patient privacy in the age of clinical AI: Scientists investigate memorization risk

2026-01-06 · Source: News on Artificial Intelligence and Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, short

Summary

MIT researchers, in a paper presented at NeurIPS 2025 and posted to the arXiv preprint server, investigated how artificial intelligence models trained on de-identified electronic health records (EHRs) can memorize patient-specific information. This "memorization" occurs when a model draws upon a singular patient record for its output, potentially violating privacy, rather than generalizing knowledge from many records. The study developed a rigorous testing setup to evaluate data leakage in a health care context, emphasizing that the risk of harm depends on the attacker's prior knowledge and the sensitivity of the leaked information. Findings indicate that more attacker information increases leakage likelihood, and some leaks, like an HIV diagnosis, are more harmful than demographic data. Patients with unique conditions are particularly vulnerable.

Key takeaway

For CTOs and VPs of Engineering overseeing AI deployments in healthcare, you must implement rigorous testing protocols to detect and mitigate patient-specific data memorization in foundation models. Your evaluation should differentiate between benign and harmful data leaks, prioritizing enhanced protections for sensitive conditions and unique patient profiles, especially as interdisciplinary legal and clinical expertise becomes integrated into model development.

Key insights

AI models trained on de-identified EHRs can memorize and leak patient data, necessitating rigorous privacy evaluation.

Principles

Memorization differs from generalization.
Leakage risk scales with attacker knowledge.
Harm depends on data sensitivity.

Method

The research team developed a series of tests to measure various types of uncertainty and assess practical risk by evaluating different tiers of attack possibility against EHR foundation models.

In practice

Distinguish generalization from memorization.
Prioritize protection for unique patient conditions.
Evaluate leakage in health care context.

Topics

AI Memorization
Patient Privacy
Electronic Health Records
Foundation Models
Healthcare AI

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Researcher, AI Ethicist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by News on Artificial Intelligence and Machine Learning.