Inference
Think Again or Think Longer? Selective Verification for Budget-Aware Reasoning
The paper introduces \sevra (Selective Verification for Reasoning Allocation), a deployment framework that optimizes the use of computational resources during test-time reasoning by deciding when to verify a frozen solver's initial output. Using the Qwen3-4B model, \sevra achieves 76.3% accuracy with a 26.8% reduction in post-generation tokens compared to constant verification, while also minimizing harmful answer changes. This approach highlights the importance of budget-aware reasoning strategies, suggesting that practitioners can enhance efficiency and accuracy by selectively verifying outputs rather than adopting a one-size-fits-all verification approach.
reasoningverificationbudget