ai-digest.dev
last updated 2 h ago
ResearcharXiv cs.AI 9 d ago

An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis

The paper presents a study on emotional speech synthesis (ESS) using an enhanced FastSpeech 2 architecture, incorporating speaker embeddings and a prosody bottleneck. The proposed system effectively generates emotional speech for a single speaker and transfers speaking styles while preserving speaker identity from neutral data. This advancement is significant for practitioners as it addresses the challenge of expressiveness in text-to-speech systems, enhancing the naturalness and variability of synthesized speech.

speech-synthesisemotional-speechrelevance 0.00 · engagement 0.00
Read at source ↗← all news
An Empirical Study on Learning Latent Representations for Emotional Speech Synthesis — AI News Digest