Research
Mask, Sample, Revise: A Revisable CTMC Inference Stack for Guided Discrete Flow Matching Text-to-Speech
The article presents "Mask, Sample, Revise," a novel inference-time Continuous-Time Markov Chain (CTMC) stack designed for alignment-free Discrete Flow Matching (DFM) in text-to-speech (TTS) synthesis. This framework enhances text conditioning through predictor-free guidance and employs a schedule-constrained remasking mechanism (SC-ReMask) to allow for revisions of early de-masking decisions, improving intelligibility and robustness without the need for post-hoc fine-tuning. This advancement is significant for practitioners as it offers a more controlled and efficient approach to discrete generation in TTS, enhancing performance in low-step conditional infilling scenarios.
text-to-speechinferencectmc