Research
Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models
The article presents RECAP, a novel replay strategy designed to mitigate general-capability forgetting in large reasoning models trained with Reinforcement Learning with Verifiable Rewards (RLVR). This approach incorporates dynamic objective reweighting to enhance knowledge preservation and improve reasoning performance without requiring additional model training or extensive tuning. Experimental results on benchmarks using Qwen2.5-VL-3B and Qwen2.5-VL-7B indicate that RECAP effectively maintains foundational skills while allowing for flexible adjustments in training focus, which is crucial for practitioners aiming to enhance the robustness of LLMs and vision-language models.
reinforcement_learningcapability_forgettinglarge_models