ResearcharXiv cs.AI — 21 h ago

Why Does Reasoning Length Converge? Unveiling the Underfitting-Overfitting Trade-off in Chain-of-Thought

The article introduces CoT-Space, a theoretical framework that transforms the reasoning process of Large Language Models (LLMs) from token prediction to an optimization task within a continuous semantic space, addressing the limitations of traditional token-level analysis. It explores the convergence of Chain-of-Thought (CoT) reasoning lengths as a result of the underfitting-overfitting trade-off, supported by experiments utilizing Reinforcement Learning (RL) to validate these insights. This framework provides practitioners with a principled approach to optimize reasoning trajectories, enhancing the efficiency and effectiveness of LLMs in complex reasoning tasks.

llmreasoningunderfittingoverfittingrelevance 0.00 · engagement 0.00

Read at source ↗← all news