Research
An LLM-Native Psychometric Instrument Does Not Predict LLM Behavior: Evidence Across 25 Models
The study presents a novel psychometric instrument designed to assess LLM behavior, constructed through exploratory factor analysis from LLM behavioral affordances. The instrument, comprising 300 items across 12 behavioral dimensions, identified a 5-factor structure with high reliability metrics (Tucker φ ≥ 0.957, α ≥ 0.930). Despite stable self-reports from LLMs, there was minimal predictive validity regarding actual behavior, highlighting a significant disconnect between LLM self-reports and human evaluations, which poses implications for alignment and evaluation frameworks in LLM applications.
llmpsychometricsbehavior