Research
JE-IRT: A Geometric Lens on LLM Abilities through Joint Embedding Item Response Theory
The article introduces JE-IRT, a novel geometric item-response theory framework that embeds both large language models (LLMs) and questions in a shared space, allowing for a multidimensional evaluation of LLM capabilities rather than a single score. Key features include the representation of question semantics through directional embeddings and difficulty via norm, enabling the analysis of out-of-distribution behavior and the identification of an LLM-internal taxonomy. This framework enhances model evaluation and generalization, allowing for the integration of new LLMs by fitting to the existing embedding space, thus providing a more interpretable approach to assessing model abilities.
llmevaluationitem response theory