Multimodal
BayLing-Duplex: Native Full-Duplex Speech Dialogue with a Single Autoregressive LLM
BayLing-Duplex is a newly introduced native full-duplex Speech Language Model (SpeechLM) that enables real-time speech interaction without the need for external Voice Activity Detection modules. Built on the GLM-4-Voice checkpoint and fine-tuned with 400K full-duplex samples, it achieves a 92% turn-taking success rate and 100% interruption success, while also enhancing the speech-response score from 2.17 to 3.39. This model demonstrates that simultaneous listening and speaking can maintain or improve response quality, making it significant for practitioners aiming to develop more interactive and responsive spoken chatbots.
speech-dialoguellmreal-time