Training
UMA-Split: unimodal aggregation for both English and Mandarin non-autoregressive speech recognition
The paper introduces UMA-Split, a non-autoregressive speech recognition model that enhances the unimodal aggregation (UMA) approach for both English and Mandarin. It addresses limitations of the original UMA, which struggled with English tokenization by implementing a split module that allows each aggregated frame to map to multiple tokens, thereby improving representation learning across languages. This advancement is significant for practitioners as it enhances the model's versatility and performance in multilingual speech recognition tasks.
speech-recognitionnon-autoregressivellm