Research
Demystifying Hidden-State Recurrence: Switchable Latent Reasoning with On-Policy Reinforcement Learning
The article introduces SWITCH, a switchable latent reasoning framework that enhances on-policy reinforcement learning (RL) by using explicit boundary tokens (<swi> and </swi>) to manage hidden-state recurrence. This approach allows for better optimization and causal interpretability, as the discrete tokens facilitate gradient propagation and direct probing during training. SWITCH demonstrates superior performance over previous methods in hidden-state-recurrence reasoning, revealing insights into the model's learned switching policy and its effective computation during latent steps, which is crucial for practitioners focused on developing interpretable and efficient RL systems.
reinforcement learninglatent reasoningcausal analysis