ai-digest.dev
last updated 13 h ago
ResearcharXiv cs.CL 7 d ago

M\"OVE: A Holistic LLM Benchmark for the German Public Sector

M\"OVE (Modelle für die Öffentliche Verwaltung Evaluieren) is a new benchmark for assessing large language models (LLMs) specifically tailored for the German public sector, evaluating 39 models across performance and governance criteria. It employs ten German-language datasets and utilizes a multi-metric evaluation strategy, revealing that no single model excels across all tasks and that model size is not a reliable quality indicator. This benchmark is significant for practitioners as it provides a comprehensive framework for model selection in public administration, addressing existing gaps in the evaluation landscape.

benchmarkpublic-sectorllmrelevance 0.00 · engagement 0.00
Read at source ↗← all news
M\"OVE: A Holistic LLM Benchmark for the German Public Sector — AI News Digest