Training
Improving End-to-End Speech Recognition for Dysarthric Speech through In-Domain Data Augmentation
The paper presents advancements in automatic speech recognition (ASR) for dysarthric speech by employing data augmentation techniques to fine-tune the Wav2Vec2 model. It investigates four methods—Speaking-Rate Modification, Pitch Modification, Formant Modification, and Vocal Tract Length Perturbation—tailored to different severity levels of dysarthria, achieving the best word error rates (WER) of 9.02% for low severity and 55.15% for high severity, with notable relative improvements of up to 30.02%. This work is significant for practitioners as it demonstrates effective strategies to enhance ASR systems in low-data scenarios, particularly for users with varying speech impairments.
dysarthric speechdata augmentationASR