ResearcharXiv cs.CL — 11 d ago

Scaling Human and G2P Supervision for Robust Phonetic Transcription

The study presents findings on the effectiveness of Grapheme-to-Phoneme (G2P) models in automatic phonetic transcription, particularly when paired with human supervision. Utilizing an 80-hour benchmark dataset, the research identifies a critical threshold where G2P supervision is beneficial only with limited human annotations (20-30 hours), beyond which it may hinder performance. The authors highlight that ASR pretraining can significantly enhance transcription accuracy, achieving a 2.3x reduction in weighted phone feature error rates, especially for non-native and aphasic speech, indicating that reliance on G2P alone may lead to diminishing returns in robust phonetic transcription.

phonetic-transcriptionG2PASRrelevance 0.00 · engagement 0.00

Read at source ↗← all news