Research
Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling
The article presents Entropy-Guided Power Sampling (EGPS), a novel sampling method that enhances reasoning capabilities in base language models without requiring parameter updates. EGPS improves upon the standard Metropolis-Hastings sampler by focusing on high-entropy decision points, resulting in a sampling cost that scales with entropy mass rather than sequence length. Evaluated on the Qwen2.5-Math-7B model, EGPS achieves state-of-the-art accuracy on MATH500 (75.8%), HumanEval (62.2%), and GPQA (42.4%) benchmarks, while offering up to a 12.6x speedup over the traditional MH approach, making it a significant advancement for practitioners aiming to optimize sampling efficiency in LLMs.
samplingentropylanguage models