ResearcharXiv cs.AI — 10 d ago

Beyond Scalars: Evaluating and Understanding LLM Reasoning via Geometric Progress and Stability

The article introduces TRACED, a framework for evaluating the reasoning quality of large language models (LLMs) through geometric kinematics, focusing on Progress (displacement) and Stability (curvature) metrics. It identifies that effective reasoning is characterized by high-progress, stable trajectories, while hallucinations correspond to low-progress, unstable patterns. This framework enhances understanding of LLM internal dynamics and demonstrates competitive performance and robustness across various benchmarks, providing practitioners with a novel approach to assess and improve LLM reliability.

llmreasoningevaluationgeometryrelevance 0.00 · engagement 0.00

Read at source ↗← all news