Research
Light-weight Pronunciation Assessment via Discrete Speech Token Surprisal
A new framework for lightweight pronunciation assessment has been introduced, which operates without the need for extensive labeled learner data by utilizing only native speech resources. The system employs a self-supervised learning (SSL) encoder and a K-means codebook for discretizing learner speech, while a token language model calculates surprisal to identify phonotactic deviations. The proposed method shows improved performance on the SpeechOcean762 dataset, with a PCC increase from 0.60 to 0.66 when using transcript guidance, approaching supervised benchmarks, and demonstrating robust cross-dataset performance on L2-ARCTIC.
pronunciationassessmentspeech