Research
When Similar Means Different: Evaluating LLMs on Arabic--Hebrew Cognates
The article introduces SemCog Bench, a benchmark comprising 1,858 Arabic-Hebrew word pairs designed to evaluate large language models (LLMs) on cognate identification and semantic disambiguation. The study finds that while LLMs perform well on true cognates, their performance significantly declines on false friends and loanwords, highlighting a reliance on surface-form similarity and indicating a limitation in cross-lingual reasoning capabilities. This benchmark provides a valuable resource for practitioners aiming to improve multilingual semantic understanding in LLMs.
llmarabichebrewcognates