Research
Cross-lingual Embedding Clustering for Hierarchical Softmax in Low-Resource Multilingual Speech Recognition
The article presents a novel cross-lingual embedding clustering method for constructing a hierarchical Softmax (H-Softmax) decoder aimed at enhancing multilingual performance in low-resource Automatic Speech Recognition (ASR) systems. By allowing similar tokens across different languages to share decoder representations, this approach overcomes limitations of the previous Huffman-based H-Softmax method, which depended on shallow feature assessments. Experimental results on a downsampled dataset of 15 languages indicate a significant improvement in ASR accuracy for low-resource languages, which is crucial for practitioners developing multilingual ASR systems.
multilingualasrlow_resource