Synthetic but Not Realistic: The Evaluation Challenge in Generative Modelling for Structured Electronic Medical Records
Summary
A new multi-dimensional evaluation framework addresses the challenge of assessing synthetic electronic medical records (EMR) data, which are often proposed as privacy-preserving substitutes. Current evaluation methods, relying on statistical similarity and predictive performance, fail to capture clinical validity. This framework, grounded in epidemiology, evaluates descriptive fidelity, clinical utility, and structural validity, corresponding to descriptive, predictive, and causal questions. Four generative paradigms (GAN-based, VAE-boosted, diffusion-based, and masked modelling) were tested using PRIME-CVD, a 50,000-person cohort. Results indicate that while all models reproduce marginal distributions, none simultaneously preserve subgroup structure, effect estimates, and dependency structure. Models with strong distributional fidelity can still exhibit poor calibration and distorted relationships, leading to unreliable inference and overestimating synthetic data quality.
Key takeaway
For Machine Learning Engineers developing synthetic EMR models, you must move beyond basic statistical metrics to ensure clinical validity. Your models, despite strong distributional fidelity, may produce unreliable clinical inferences if subgroup and dependency structures are not preserved. Prioritize domain-informed evaluation that assesses descriptive fidelity, clinical utility, and structural validity to prevent overestimating data quality and ensure reliable scientific conclusions.
Key insights
Evaluating synthetic EMR data requires a multi-dimensional framework beyond statistical similarity to ensure clinical validity.
Principles
- Clinical validity demands domain-informed assessment.
- Statistical fidelity does not guarantee reliable inference.
- Synthetic data must preserve subgroup and dependency structures.
Method
The proposed framework assesses descriptive fidelity (descriptive questions), clinical utility (predictive questions), and structural validity (causal questions) for synthetic EMR data.
In practice
- Evaluate synthetic EMR for subgroup structure.
- Assess effect estimates preservation.
- Check dependency structure fidelity.
Topics
- Synthetic Data Generation
- Electronic Medical Records
- Data Evaluation Frameworks
- Clinical Validity
- Generative Models
- Healthcare Data Privacy
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.