Research
Not Truly Multilingual: Script Consistency as a Missing Dimension in VLM Evaluation
The paper introduces PuMVR, a benchmark for evaluating Vision-Language Models (VLMs) using 1,000 parallel image-text instances across three scripts of Punjabi: Gurmukhi, Shahmukhi, and Roman. Evaluations of 10 state-of-the-art VLMs reveal a significant Script Gap, with accuracy differences of up to 16% on identical tasks across scripts, highlighting the inadequacy of current multilingual assessments which overlook multi-script scenarios. The authors propose the Script Consistency Rate (SCR) as a crucial metric for evaluating script-agnostic performance, emphasizing the need for equitable AI access in multi-script language environments.
multilingualvlm-evaluation