ResearcharXiv cs.CL — 7 d ago

From Benchmarks to Skills: Low-Rank Factors for LLM Evaluation

This article introduces a novel evaluation framework for large language models (LLMs) based on Factor Analysis (FA) of a performance matrix comprising 60 models and 44 benchmarks, revealing an intrinsically low-rank structure. This approach identifies a small number of latent factors that capture the majority of model behavior, highlighting redundancy in existing benchmarks and enabling practitioners to better profile models, identify redundant tasks, and select models based on specific skill profiles. This framework offers a more interpretable alternative to traditional aggregate scoring, enhancing the understanding of LLM capabilities.

llmevaluationlow-rankrelevance 0.00 · engagement 0.00

Read at source ↗← all news