Your Synthetic Data Passed Every Test and Still Broke Your Model
Summary
Synthetic data quality evaluation often fails in production due to an overreliance on standard metrics that do not capture critical aspects of data behavior. The "fidelity-utility-privacy" framework, while conceptually sound, is frequently misapplied by practitioners who evaluate metrics sequentially and overlook crucial details. Specifically, common fidelity metrics like KL Divergence and Kolmogorov-Smirnov Test only assess marginal distributions, missing feature correlations. Utility metrics, such as aggregate TSTR AUC scores, conceal tail performance issues, leading to models that fail on rare but important events. Furthermore, privacy metrics like membership inference risk often focus on record-level identification, ignoring attribute inference risks where sensitive features can be deduced from quasi-identifiers. This article proposes augmented checks for each dimension: Correlation Drift Score for fidelity, TSTR stratified by target decile for utility, and an Attribute Inference Lift test for privacy, emphasizing that evaluation thresholds must align with specific use cases and access conditions.
Key takeaway
For Data Scientists and MLOps Engineers deploying synthetic data, your current evaluation framework likely has blind spots that can lead to production failures. You should integrate correlation drift analysis, stratified TSTR utility scores, and attribute inference risk assessments into your validation pipeline. Define your privacy and utility thresholds based on the specific use case and data access conditions before synthesis, rather than relying on default tool metrics, to prevent critical model degradation on edge cases or sensitive data leaks.
Key insights
Standard synthetic data metrics often miss critical issues like feature correlations, tail performance, and attribute inference risks.
Principles
- Zero risk equals zero utility for synthetic data.
- Define use case before evaluating synthetic data.
- Thresholds must align with specific use cases.
Method
Augment standard synthetic data evaluation with Correlation Drift Score for fidelity, TSTR stratified by decile for utility, and an Attribute Inference Lift test for privacy.
In practice
- Compute Frobenius norm for correlation drift.
- Stratify TSTR by target variable deciles.
- Test attribute inference lift for sensitive features.
Topics
- Synthetic Data Quality
- Fidelity-Utility-Privacy Framework
- Correlation Drift Score
- Tail Loss Problem
- Attribute Inference Risk
Best for: Machine Learning Engineer, Data Scientist, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.