ResearcharXiv cs.CL — 7 d ago

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

M\"OVE (Modelle für die Öffentliche Verwaltung Evaluieren) is a new benchmark for assessing large language models (LLMs) specifically tailored for the German public sector, evaluating 39 models across performance and governance criteria. It employs ten German-language datasets and utilizes a multi-metric evaluation strategy, revealing that no single model excels across all tasks and that model size is not a reliable quality indicator. This benchmark is significant for practitioners as it provides a comprehensive framework for model selection in public administration, addressing existing gaps in the evaluation landscape.

benchmarkpublic-sectorllmrelevance 0.00 · engagement 0.00

Read at source ↗← all news