Training
Learning When to Sample: Confidence-Aware Selective Sampling for Efficient Chain-of-Thought Reasoning
The paper introduces a confidence-aware selective sampling framework designed to optimize chain-of-thought (CoT) reasoning in large language models (LLMs) by adaptively deciding between single and multi-path sampling during inference. Utilizing trajectory-level numeric and sentence-level linguistic features, the method achieves comparable accuracy to traditional multi-path approaches while significantly reducing token usage by 71.7% and 36.6% in various benchmarks, including MedQA and MathQA. This framework is significant for practitioners as it enhances reasoning efficiency without compromising performance, making it a valuable tool for resource-constrained applications of LLMs.
samplingchain-of-thoughtllm