RAG
RASST: Retrieval-Augmented Simultaneous Speech Translation
The article introduces Retrieval-Augmented Simultaneous Speech Translation (RASST), a model designed to enhance simultaneous speech translation by integrating a lightweight speech-text retriever for accurate cross-modal retrieval of terminology under partial speech input. RASST improves terminology accuracy by nearly 40% and overall translation quality by up to 3 BLEU points while maintaining low computational overhead. This advancement is significant for practitioners as it addresses the challenges of rare and domain-specific terminology in real-time translation scenarios.
speech-translationretrieval-augmentedllm