Research
A Comparative Study of Pretrained Transformer Models for Quranic ASR: Speech Representations, Label Formats, and Dataset Composition
This study presents a systematic evaluation of pretrained Transformer models for Quranic Automatic Speech Recognition (ASR), utilizing Wav2Vec2.0, HuBERT, and XLS-R for fine-tuning on a dataset of over 870 hours of recitations. The best-performing model achieved a Word Error Rate (WER) of 0.08 on the EveryAyah subset, significantly improving over the Citrinet baseline (WER = 0.163) and reducing training time from 140 to 40 hours. These findings highlight the importance of model selection, dataset composition, and feature extraction techniques in enhancing transcription accuracy for domain-specific ASR applications.
asrqurantransformer