When AUC Misleads: Polarization-Aware Evaluation of Deepfake Detectors under Domain Shift
Summary
A new metric, Cross-dataset AUC (Cross-AUC), is introduced to more realistically evaluate deepfake detectors, addressing limitations of traditional Area Under the ROC Curve (AUC) methods. Current AUC evaluations, measured separately across multiple datasets, fail to capture real-world scenarios involving mixed data sources and diverse artifact types. Cross-AUC averages per-domain AUCs and incorporates a measure of prediction polarization, quantified by the Wasserstein Distance between class score distributions, to account for robustness to domain shift. This approach not only provides a more accurate assessment of generalization capabilities but also offers interpretability by explaining performance drops. Its practical relevance was demonstrated through experiments on seven benchmark datasets.
Key takeaway
For Machine Learning Engineers developing deepfake detection systems, adopting Cross-dataset AUC (Cross-AUC) is crucial for a realistic evaluation of model generalization. Traditional AUC metrics can obscure performance issues when models encounter diverse, unseen manipulations or mixed data sources. You should integrate Cross-AUC into your evaluation pipeline to accurately assess robustness to domain shift and gain clearer insights into why your detector's performance might degrade in real-world deployments.
Key insights
Cross-AUC offers a polarization-aware evaluation metric for deepfake detectors, improving generalization assessment under domain shift.
Principles
- Traditional AUC misleads on mixed data.
- Real-world deepfake detection needs polarization awareness.
- Domain shift robustness is key.
Method
Cross-AUC averages per-domain AUCs with a prediction polarization measure. Polarization extent is quantified by the Wasserstein Distance between class score distributions.
In practice
- Evaluate deepfake detectors under domain shift.
- Interpret reasons for performance drops.
- Assess robustness to mixed data sources.
Topics
- Deepfake Detection
- Evaluation Metrics
- Domain Shift
- AUC
- Wasserstein Distance
- Generative AI
Best for: Research Scientist, AI Scientist, Computer Vision Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.