Entropy-Gated Latent Recursion
The paper introduces Entropy-Gated Latent Recursion (EGLR), a deterministic decoding method that enhances language model reasoning by re-applying the top decoder layers at various layer spans ($L$) for high-uncertainty tokens, creating a diverse rollout space without stochasticity. The approach was tested on eight instruction-tuned models across six math reasoning benchmarks, achieving a notable improvement on the MATH-500 benchmark with the Qwen2.5-3B-Instruct model, reaching an accuracy of 91.6%, which is significantly higher than traditional temperature-only sampling methods. This method provides a richer set of rollouts for downstream applications, suggesting a new paradigm for inference-time scaling that leverages deterministic processes instead of relying solely on stochastic sampling.