MultimodalReddit r/LocalLLaMA — 6 d ago

ZONOS2: real-time TTS with 8B params, 900M active, and high-fidelity voice cloning

Zyphra has released ZONOS2, a next-generation real-time text-to-speech (TTS) model featuring 8 billion total parameters and 900 million active parameters during inference. This model is notable for being the first sparse mixture of experts (MoE) TTS model released as open-source, aiming to balance high-fidelity voice cloning and rapid synthesis without compromising quality. ZONOS2 excels in zero-shot voice cloning, effectively capturing unique speaker traits, and achieves a TTSDS Prosody Score of 88.7, outperforming several existing models, making it a significant advancement for practitioners in TTS applications.

ttsvoice_cloningzonos2relevance 0.00 · engagement 0.00

Read at source ↗← all news