SafetyarXiv cs.AI — 7 d ago

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

The article presents research contrasting the effectiveness of self-reports (SR) using the Big 5 personality traits against the Theory of Planned Behavior (TPB) in predicting the behavior of large language models (LLMs). Through experiments involving 11 frontier LLMs, it finds that TPB provides human-level coherence within shared conversational contexts, while the Big 5 framework fails to do so. This suggests that for reliable psychometric evaluation and safe deployment of LLMs, more nuanced and behavior-specific assessment tools are necessary, as traditional personality frameworks may inadequately capture LLM behavior across different contexts.

psychometric evaluationllmbehavior predictionrelevance 0.00 · engagement 0.00

Read at source ↗← all news