Multimodal
OLaPh: Optimal Language Phonemizer
OLaPh (Optimal Language Phonemizer) is a hybrid framework that combines extensive multilingual lexica with advanced NLP techniques and a statistical subword segmentation function for improved phonemization in text-to-speech synthesis. Evaluations on the WikiPron benchmark reveal that OLaPh significantly outperforms traditional methods in accuracy and robustness on out-of-vocabulary terms, while also facilitating the creation of a high-consistency training corpus for instruction-tuned Large Language Models (LLMs), which demonstrate strong generalization capabilities. This development offers a comprehensive open-source resource for researchers focusing on multilingual grapheme-to-phoneme conversion (G2P).
phonemizationttsllmneural-networks