Research
ReSum: Synergizing LLM Reasoning and Summarization with Reinforcement Learning
ReSum is a novel framework that enhances long-horizon reasoning in Large Language Models (LLMs) through Reinforcement Learning with Verifiable Rewards (RLVR) by enabling self-summarization of reasoning trajectories. The framework introduces a summarization-aware adaptive rollout mechanism that improves coherence and reduces token-level entropy, achieving an average performance increase of 4% while decreasing rollout length by 18.6%. This approach is significant for practitioners as it allows LLMs to manage their reasoning more effectively, potentially leading to more efficient and accurate model outputs in complex tasks.
llmreasoningsummarization