Coding
Automated Pronunciation Evaluation for Korean Toddler Speech using Speech Diarization and Self-Supervised Learning
The paper presents an end-to-end automated pronunciation evaluation pipeline for Korean toddler speech, integrating neural speaker diarization and self-supervised learning. It introduces a new corpus of 53 recordings from children aged 2-5 and evaluates three diarization models, with NeMo SortFormer achieving 88.69% speaker count accuracy and 33.04% diarization error rate. For pronunciation scoring, an ensemble approach using HuBERT-large and WavLM-large yields balanced accuracies of 0.720 for consonants and 0.845 for vowels, indicating potential advancements in automated speech assessment tools for pediatric communication disorders.
speechself-supervisedevaluation