Research
PASQA: Pitch-Accent-Focused Speech Quality Assessment Model Trained on Synthetic Speech with Accent Errors
The article presents the Pitch-Accent-focused Speech Quality Assessment (PASQA) model, designed to improve the prediction of pitch-accent correctness in speech quality assessments. PASQA utilizes a controlled dataset of synthetic Japanese speech with accent errors, incorporating techniques such as self-supervised representations, mora-conditioned fusion, and an auxiliary accent-error localization task. This model demonstrates superior performance in ordering accent-error severity and aligns more closely with human judgments compared to traditional mean opinion score models, making it a valuable tool for practitioners focusing on nuanced speech evaluation in AI applications.
speech-qualityaccent-errorsmosllm