TrainingarXiv cs.CL — 15 d ago

REVES: REvision and VErification--Augmented Training for Test-Time Scaling

The paper introduces REVES, a two-stage iterative framework designed to enhance test-time scaling for Large Language Models (LLMs) through revision and verification. This method utilizes online data augmentation and policy optimization to transform intermediate "near-miss" answers into effective revision prompts, achieving a +6.5 point improvement over reinforcement learning baselines on the LiveCodeBench dataset, while using a smaller 4B parameter model and fewer rollouts than traditional methods. This approach not only enhances correction capabilities in coding tasks but also demonstrates generalization to constraint-satisfaction problems, making it significant for practitioners aiming to improve LLM reasoning and efficiency.

trainingllmscalingrelevance 0.00 · engagement 0.00

Read at source ↗← all news