Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection
Summary
A new framework, Dual-Granularity Orthogonal Disentanglement, addresses the challenge of generalizable audio deepfake detection by preventing detectors from learning speaker-identity features instead of synthesis artifacts. This method tackles implicit identity leakage, a common issue causing poor generalization across speakers, without increasing architectural complexity or training instability. It enforces feature independence at two levels: sample-level cosine orthogonality for directional decorrelation and batch-level cross-covariance regularization for eliminating linear correlations across embedding dimensions. The framework incorporates a curriculum disentanglement schedule to progressively strengthen orthogonality constraints. Experiments on ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets yielded equal error rates (EER) of 1.35%, 7.88%, and 21.58%, respectively. Notably, it outperformed gradient reversal disentanglement by 2.60% absolute on cross-dataset transfer.
Key takeaway
For AI Security Engineers developing robust audio deepfake detection systems, consider integrating dual-granularity orthogonal disentanglement. This approach directly addresses implicit identity leakage, improving model generalization across diverse speakers and unseen data. You should evaluate its performance on cross-dataset transfer benchmarks, as it demonstrated a 2.60% absolute improvement over gradient reversal methods, potentially enhancing the reliability of your deepfake countermeasures.
Key insights
Dual-Granularity Orthogonal Disentanglement improves audio deepfake detection generalization by separating speaker identity from synthesis artifacts.
Principles
- Feature independence improves deepfake generalization.
- Orthogonality can prevent identity leakage.
- Progressive constraint strengthening aids stability.
Method
The framework uses sample-level cosine orthogonality and batch-level cross-covariance regularization to enforce feature independence. A curriculum disentanglement schedule progressively strengthens these constraints.
In practice
- Apply dual-granularity disentanglement to audio deepfake models.
- Implement curriculum scheduling for stable training.
- Evaluate cross-dataset transfer for generalization.
Topics
- Audio Deepfake Detection
- Feature Disentanglement
- Generalization
- Orthogonality
- ASVspoof
- Curriculum Learning
Best for: Research Scientist, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.