Synthetic but Not Realistic: The Evaluation Challenge in Generative Modelling for Structured Electronic Medical Records

2026-06-08 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, AI in Healthcare · Depth: Advanced, quick

Summary

A new multi-dimensional evaluation framework addresses the challenge of assessing synthetic electronic medical records (EMR) data, which are often proposed as privacy-preserving substitutes. Current evaluation methods, relying on statistical similarity and predictive performance, fail to capture clinical validity. This framework, grounded in epidemiology, evaluates descriptive fidelity, clinical utility, and structural validity, corresponding to descriptive, predictive, and causal questions. Four generative paradigms (GAN-based, VAE-boosted, diffusion-based, and masked modelling) were tested using PRIME-CVD, a 50,000-person cohort. Results indicate that while all models reproduce marginal distributions, none simultaneously preserve subgroup structure, effect estimates, and dependency structure. Models with strong distributional fidelity can still exhibit poor calibration and distorted relationships, leading to unreliable inference and overestimating synthetic data quality.

Key takeaway

For Machine Learning Engineers developing synthetic EMR models, you must move beyond basic statistical metrics to ensure clinical validity. Your models, despite strong distributional fidelity, may produce unreliable clinical inferences if subgroup and dependency structures are not preserved. Prioritize domain-informed evaluation that assesses descriptive fidelity, clinical utility, and structural validity to prevent overestimating data quality and ensure reliable scientific conclusions.

Key insights

Evaluating synthetic EMR data requires a multi-dimensional framework beyond statistical similarity to ensure clinical validity.

Principles

Clinical validity demands domain-informed assessment.
Statistical fidelity does not guarantee reliable inference.
Synthetic data must preserve subgroup and dependency structures.

Method

The proposed framework assesses descriptive fidelity (descriptive questions), clinical utility (predictive questions), and structural validity (causal questions) for synthetic EMR data.

In practice

Evaluate synthetic EMR for subgroup structure.
Assess effect estimates preservation.
Check dependency structure fidelity.

Topics

Synthetic Data Generation
Electronic Medical Records
Data Evaluation Frameworks
Clinical Validity
Generative Models
Healthcare Data Privacy

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.