Training
L-Proto: Language-Aware Episodic Prototypical Training for Multilingual Speaker Verification
L-Proto introduces a language-aware episodic prototypical training strategy aimed at enhancing multilingual speaker verification by mitigating language-dependent acoustic variability. By sampling speakers from a single language per episode, this method encourages embeddings to prioritize speaker identity over linguistic characteristics. Experiments on the TidyVoice Challenge benchmark show that L-Proto outperforms traditional fine-tuning and random episodic sampling across various backbone architectures, offering a significant advancement for practitioners dealing with multilingual audio data.
speaker verificationmultilingualepisodic training