Training
REVES: REvision and VErification--Augmented Training for Test-Time Scaling
The paper introduces REVES, a two-stage iterative framework designed to enhance test-time scaling for Large Language Models (LLMs) through revision and verification. This method utilizes online data augmentation and policy optimization to transform intermediate "near-miss" answers into effective revision prompts, achieving a +6.5 point improvement over reinforcement learning baselines on the LiveCodeBench dataset, while using a smaller 4B parameter model and fewer rollouts than traditional methods. This approach not only enhances correction capabilities in coding tasks but also demonstrates generalization to constraint-satisfaction problems, making it significant for practitioners aiming to improve LLM reasoning and efficiency.
trainingllmscaling