Research
Enhancing Multilingual LLM-based ASR with Mixture of Experts and Dynamic Downsampling
The paper presents a projector-based LLM-ASR framework that integrates a Mixture of Experts (MoE) architecture and a Continuous Integrate-and-Fire (CIF) mechanism to enhance multilingual generalization and modality alignment in automatic speech recognition. Experimental results demonstrate significant performance improvements over strong baseline models, indicating that this approach could lead to more accurate and robust LLM-based ASR systems. This advancement is crucial for practitioners aiming to build effective multilingual ASR applications leveraging LLMs.
asrmultilingualllm