Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection

2026-06-15 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

A new framework, Dual-Granularity Orthogonal Disentanglement, addresses the challenge of generalizable audio deepfake detection by preventing detectors from learning speaker-identity features instead of synthesis artifacts. This method tackles implicit identity leakage, a common issue causing poor generalization across speakers, without increasing architectural complexity or training instability. It enforces feature independence at two levels: sample-level cosine orthogonality for directional decorrelation and batch-level cross-covariance regularization for eliminating linear correlations across embedding dimensions. The framework incorporates a curriculum disentanglement schedule to progressively strengthen orthogonality constraints. Experiments on ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets yielded equal error rates (EER) of 1.35%, 7.88%, and 21.58%, respectively. Notably, it outperformed gradient reversal disentanglement by 2.60% absolute on cross-dataset transfer.

Key takeaway

For AI Security Engineers developing robust audio deepfake detection systems, consider integrating dual-granularity orthogonal disentanglement. This approach directly addresses implicit identity leakage, improving model generalization across diverse speakers and unseen data. You should evaluate its performance on cross-dataset transfer benchmarks, as it demonstrated a 2.60% absolute improvement over gradient reversal methods, potentially enhancing the reliability of your deepfake countermeasures.

Key insights

Dual-Granularity Orthogonal Disentanglement improves audio deepfake detection generalization by separating speaker identity from synthesis artifacts.

Principles

Feature independence improves deepfake generalization.
Orthogonality can prevent identity leakage.
Progressive constraint strengthening aids stability.

Method

The framework uses sample-level cosine orthogonality and batch-level cross-covariance regularization to enforce feature independence. A curriculum disentanglement schedule progressively strengthens these constraints.

In practice

Apply dual-granularity disentanglement to audio deepfake models.
Implement curriculum scheduling for stable training.
Evaluate cross-dataset transfer for generalization.

Topics

Audio Deepfake Detection
Feature Disentanglement
Generalization
Orthogonality
ASVspoof
Curriculum Learning

Best for: Research Scientist, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.