Research
Perceptual compensation for tonal context in self-supervised speech models
This study investigates the wav2vec2.0 architecture's ability to compensate for phonological context in Mandarin Chinese tones through a pseudo-replication of a perceptional compensation experiment. The research found no evidence of compensation in the embedding similarities of the purely self-supervised model, while the fine-tuned model showed some improvements in classifier outputs but did not achieve human-level performance. These findings indicate that supervised training may be essential to enable models to abstract certain phonological regularities, which is critical for practitioners developing robust ASR systems.
self-supervised modelsspeech recognitionphonological context