Training
Select to Think: Unlocking SLM Potential with Local Sufficiency
The article introduces Select to Think (S2T), a method that enhances small language models (SLMs) by enabling them to autonomously re-rank their top-K next-token predictions based on selections from a larger language model (LLM). The proposed S2T-Local distills this selection logic into the SLM, resulting in a 1.5B parameter model achieving a 95% hit rate for the LLM's preferred token among its top-8 candidates and a 24.1% relative improvement in Math Avg. performance over greedy decoding. This approach allows practitioners to leverage the efficiency of SLMs while maintaining high reasoning accuracy, reducing the need for costly LLM dependencies during inference.
llmsmall-language-modelslocal-sufficiency