Research
PiDA: Phonetically-Informed Data Augmentation for Robust Vietnamese Speech Translation
The article presents Phonetically-Informed Data Augmentation (PiDA), a method designed to enhance the robustness of Vietnamese speech translation systems by addressing substitution errors in Automatic Speech Recognition (ASR). By leveraging phonetic word embeddings to create ASR-like corruptions, PiDA improves translation accuracy on erroneous ASR outputs, achieving a BLEU score increase of up to +2.04 compared to standard fine-tuning on the FLEURS Vietnamese-English dataset. This approach is significant for practitioners as it provides a systematic way to mitigate ASR error propagation in cascaded speech translation systems, ultimately enhancing translation quality.
speech-translationdata-augmentationASR