Safety
Dual-Granularity Orthogonal Disentanglement for Generalizable Audio Deepfake Detection
The paper introduces a dual-granularity orthogonal disentanglement framework for audio deepfake detection, addressing the challenge of generalization across different speakers by enforcing feature independence at both sample and batch levels. The model employs cosine orthogonality and cross-covariance regularization to minimize implicit identity leakage, achieving significant improvements in equal error rates (EER) on the ASVspoof 2019 LA, ASVspoof 2021 DF, and In-the-Wild datasets, outperforming existing methods by up to 2.60% in cross-dataset transfer. This advancement is crucial for practitioners as it simplifies model architecture while enhancing robustness in detecting audio deepfakes across diverse datasets.
audiodeepfakedetection