Multimodal
Modeling Sarcastic Speech: Semantic and Prosodic Cues in a Speech Synthesis Framework
The article presents a computational framework for modeling sarcasm in speech synthesis by integrating semantic cues from a fine-tuned LLaMA 3 model and prosodic cues derived from a database of sarcastic speech. Perceptual evaluations indicate that this combined approach enhances the recognition of sarcasm, achieving superior downstream F1 scores and high subjective ratings. This work is significant for practitioners as it demonstrates the importance of both semantic interpretation and prosodic delivery in improving the effectiveness of speech synthesis systems.
sarcasmspeech-synthesisprosody