Inference
Fractured Chain-of-Thought Reasoning
The paper introduces Fractured Sampling, a novel inference-time strategy that optimizes Chain-of-Thought (CoT) reasoning in large language models by allowing for truncated reasoning processes. This method balances the number of reasoning trajectories, final solutions, and reasoning depth, achieving improved accuracy with reduced token usage across five reasoning benchmarks. The findings highlight a significant enhancement in the accuracy-cost trade-off, making LLM deployment more efficient in latency-sensitive applications, which is crucial for practitioners aiming to optimize performance while managing computational costs.
reasoningllmchain-of-thought