Research
Reasoning Models Know What's Important, and Encode It in Their Activations
The study investigates how language models encode the importance of reasoning steps in their activations, revealing that model internals contain more informative signals than the tokens themselves. By training probes on these activations, the researchers demonstrate that models can represent step importance prior to generating subsequent steps, with high agreement across different models on which steps are critical. This highlights the potential for analyzing model activations to gain deeper insights into reasoning processes, suggesting a shift in focus from surface-level features to internal representations for practitioners working with LLMs.
llmreasoningactivations