Research
SuCo: Sufficiency-guided Continuous Adaptive Reasoning
The paper introduces SuCo, a two-stage training framework that employs Minimal Sufficient Chain-of-Thought (MSC) to enhance reasoning efficiency in Large Reasoning Models (LRMs). The first stage involves MSC-Aligned Fine-Tuning (MFT), which adapts reasoning patterns based on question difficulty, while the second stage utilizes Sufficiency-Aware Policy Optimization (SAPO) to optimize reasoning through reinforcement learning. Empirical results demonstrate that SuCo reduces reasoning tokens and improves accuracy across various benchmarks, making it significant for practitioners looking to enhance LLM efficiency and performance.
reasoningllmefficiency