Training
Learning to Hear Hesitation: Continual Learning for Disfluency-Aware ASR
The paper presents a continual learning approach to enhance Automatic Speech Recognition (ASR) systems' handling of disfluent speech by incorporating explicit disfluency tokens into a pretrained model. This method addresses the issue of catastrophic forgetting when adapting to new datasets with varying disfluency distributions, revealing a trade-off between learning disfluency markers and overall ASR performance. The findings are significant for practitioners as they suggest a framework to improve ASR accuracy in real-world applications where disfluencies are common, potentially reducing information loss and hallucinations in transcriptions.
asrdisfluencycontinual-learning