Inference
NIM4-ASR: Towards Efficient, Robust, and Customizable Real-Time LLM-Based ASR
NIM4-ASR is a new LLM-based automatic speech recognition framework that addresses scalability and robustness in resource-constrained environments. It features a redesigned multi-stage training paradigm, including a pre-training architecture aimed at reducing modality gaps, an asynchronous supervised fine-tuning stage to maintain acoustic fidelity, and a reinforcement learning component to enhance recognition quality. With only 2.3 billion parameters, NIM4-ASR achieves state-of-the-art performance on public benchmarks and excels in real-world scenarios, supporting rapid hotword customization through retrieval-augmented generation for efficient adaptation to user needs.
asrllmefficiency