Research
From Self-Supervised Speech Models to Mixture-of-Experts for Robust Anti-Spoofing
This work presents a novel Mixture-of-Experts (MoE) architecture derived from a self-supervised speech representation model to enhance robustness in anti-spoofing systems. By integrating multiple expert networks with a layer-wise gating mechanism, the model effectively captures diverse acoustic patterns while leveraging pre-trained representations. Evaluated across 14 spoofing datasets, the approach achieves a macro Equal Error Rate (EER) reduction from 5.46% to 4.81%, indicating an 11.9% relative improvement, which is significant for practitioners focusing on improving the reliability of spoofing detection in synthetic speech.
anti-spoofingspeech modelsmixture-of-experts