Inference
MARS: Margin-Adversarial Risk-controlled Stopping for Parallel LLM Test-time Scaling
The article introduces MARS (Margin-Adversarial Risk-controlled Stopping), a method designed to improve the efficiency of parallel test-time scaling for large language models (LLMs) by allowing early stopping of reasoning traces without sacrificing accuracy. MARS employs a margin-adversarial stopping rule that estimates the likelihood of answer changes in active traces, achieving a reduction in computational overhead by saving 25-47% of self-consistency tokens compared to existing methods, while maintaining accuracy. This approach is significant for practitioners as it enables more efficient resource utilization during inference, particularly in scenarios requiring high accuracy with reduced computational costs.
llmscalingtest-time