TrainingarXiv cs.AI — 4 d ago

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

The paper presents a novel method called "Rejuvenation" aimed at improving the transition from Supervised Fine-Tuning (SFT) to Reinforcement Learning (RL) in Large Language Models (LLMs). It identifies that excessive SFT leads to reduced model plasticity, characterized by over-confident token distributions and sharp parameter landscapes, which hinder RL optimization. By employing base-anchored model fusion and targeted neuron resets, Rejuvenation effectively restores model flexibility, resulting in enhanced RL performance and better generalization on out-of-distribution tasks, thereby addressing a critical challenge in the SFT-to-RL pipeline for practitioners.

SFTRLmodel plasticityrelevance 0.00 · engagement 0.00

Read at source ↗← all news