ResearcharXiv cs.CL — 8 d ago

Does the Judge Prefer English? Evaluating Language-Switching Invariance in LLM-as-a-Judge

The article introduces Judge-LS, a meta-evaluation protocol designed to assess language-switching invariance in large language models (LLMs) used for instruction-following evaluation. The study evaluates four API-accessible judges on the LLMBar benchmark, revealing that language-switching presentations induce a 10.7–14.4% preference shift towards English, while most translation-equivalent probes are judged as ties, suggesting an absence of systematic English preference. This work is significant for practitioners as it highlights the potential biases in LLM evaluations based on language, informing the development of more robust evaluation frameworks.

llmlanguage-switchingevaluationrelevance 0.00 · engagement 0.00

Read at source ↗← all news