Research
AfroScope: A Framework for Studying the Linguistic Landscape of Africa
AfroScope is a newly introduced framework for Language Identification (LID) in Africa, featuring AfroScope-Data, a dataset encompassing 640 languages, and AfroScope-Models, a suite of advanced LID models. The framework employs a hierarchical classification method and introduces AfroScope-Mirror, a specialized embedding model that enhances disambiguation among closely related languages, achieving a macro-F1 improvement of 1.57 points on a confusable subset. This framework is significant for practitioners as it addresses the limitations in existing African LID systems, enabling more accurate and comprehensive language identification crucial for downstream NLP applications.
llmlanguage-identificationafrica