Models
WavSLM: Single-Stream Speech Language Modeling via WavLM Distillation
WavSLM is a novel speech language model that utilizes a single-stream autoregressive training approach by distilling self-supervised WavLM representations into a unified codebook, enabling it to predict the next chunk of audio without relying on text supervision. This model achieves competitive performance on consistency benchmarks and speech generation tasks, while operating with fewer parameters and less training data, and it supports streaming inference. Its development is significant for practitioners as it simplifies the integration of semantic and acoustic information in speech models, potentially improving efficiency in speech processing applications.
speechwavlmllm