Safety
Robust Spoofed Speech Detection via Temporal Pyramid Modeling
The paper introduces a Temporal Pyramid Adapter for spoofed speech detection, employing parallel temporal convolutions with varying receptive fields to enhance the model's ability to identify multi-scale spoofing cues. The model integrates self-supervised XLS-R representations and achieves a state-of-the-art AUC of 99.24% and an EER of 3.87% on the PartialSpoof dataset, outperforming existing models like LCNN-BLSTM and TRACE. This advancement is crucial for practitioners as it addresses the challenges of cross-dataset generalization and highlights the importance of adaptation strategies in maintaining performance across different domains and languages.
spoofed speech detectionvoice conversionsecurity