Multimodal
ZONOS2: real-time TTS with 8B params, 900M active, and high-fidelity voice cloning
Zyphra has released ZONOS2, a next-generation real-time text-to-speech (TTS) model featuring 8 billion total parameters and 900 million active parameters during inference. This model is notable for being the first sparse mixture of experts (MoE) TTS model released as open-source, aiming to balance high-fidelity voice cloning and rapid synthesis without compromising quality. ZONOS2 excels in zero-shot voice cloning, effectively capturing unique speaker traits, and achieves a TTSDS Prosody Score of 88.7, outperforming several existing models, making it a significant advancement for practitioners in TTS applications.
ttsvoice_cloningzonos2