ai-digest.dev
last updated 4 h ago
InferencearXiv cs.CL 16 d ago

NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR

NIM4-ASR is a new LLM-based automatic speech recognition framework that addresses scalability and robustness in resource-constrained environments. It features a redesigned multi-stage training paradigm, including a pre-training architecture aimed at reducing modality gaps, an asynchronous supervised fine-tuning stage to maintain acoustic fidelity, and a reinforcement learning component to enhance recognition quality. With only 2.3 billion parameters, NIM4-ASR achieves state-of-the-art performance on public benchmarks and excels in real-world scenarios, supporting rapid hotword customization through retrieval-augmented generation for efficient adaptation to user needs.

asrllmefficiencyrelevance 0.00 · engagement 0.00
Read at source ↗← all news
NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR — AI News Digest