Research
AfriSUD: A Dependency Treebank Collection for Evaluating Models on African Languages
AfriSUD is introduced as the first large-scale collection of syntactically annotated treebanks for nine African languages, utilizing the Surface-Syntactic Universal Dependencies (SUD) framework. This resource, verified by native speakers, highlights key syntactic features like agglutination and tone, and is used to evaluate various models—including non-transformer baselines, multilingual pretrained encoders, and LLMs—for tasks like part-of-speech tagging and dependency parsing. The findings indicate a significant syntax gap, demonstrating that current models struggle to effectively represent the structural diversity found in African languages, which is crucial for practitioners developing NLP solutions in these languages.
africanlanguagestreebanknlp