RAGarXiv cs.AI — 10 d ago

LM-SPT: LM-Aligned Semantic Distillation for Speech Tokenization

The article presents LM-SPT, a novel speech tokenization method that utilizes semantic speech-resynthesis distillation to enhance alignment with language models (LMs). Unlike traditional approaches that rely on self-supervised learning teachers and pooling, LM-SPT resynthesizes speech from semantic tokens, leading to lower frame rates and improved semantic alignment without sacrificing speech reconstruction fidelity. Experimental results demonstrate that LM-SPT outperforms existing semantic-enhanced tokenizers in tasks such as automatic speech recognition and text-to-speech, which is significant for practitioners seeking to integrate SLMs more effectively with LMs.

speech-tokenizationsemantic-distillationlanguage-modelsrelevance 0.00 · engagement 0.00

Read at source ↗← all news