ResearcharXiv cs.CL — 15 d ago

DiFlow-TTS: Compact and Low-Latency Zero-Shot Text-to-Speech with Discrete Flow Matching

DiFlow-TTS is a newly proposed zero-shot text-to-speech framework that utilizes discrete flow matching to improve generation quality and inference efficiency. It features a deterministic Phoneme-Content Mapper for linguistic modeling and a Factorized Discrete Flow Denoiser that generates prosody and acoustic token streams simultaneously. This architecture addresses the latency issues of autoregressive models and the training constraints of diffusion-based methods, making it a significant advancement for practitioners in the TTS domain.

text-to-speechzero-shotdiscrete-flowrelevance 0.00 · engagement 0.00

Read at source ↗← all news