Training
Cross-Dataset, Age, and Gender Generalization: A Comprehensive Analysis of Fine-Tuning Strategies for Low-Resource Children's ASR
This paper presents a detailed analysis of fine-tuning strategies for Automatic Speech Recognition (ASR) systems aimed at recognizing dysarthric speech, utilizing the Factorized Time Delay Neural Network (F-TDNN) architecture. Key findings include a 4.65% relative improvement in isolated word recognition and a 4.63% improvement in sentence recognition by incorporating Pitch features and optimizing the overlap of training frames. This research is significant for practitioners as it offers effective feature selection strategies and model adjustments that enhance ASR performance in low-resource scenarios involving variable speech patterns.
ASRdysarthric speechdata augmentation