Training
Scaling LLM Reasoning from Minimal Labels: A Semi-Supervised Framework with a Lightweight Verifier
The paper introduces a semi-supervised framework for training large language models (LLMs) that utilizes a lightweight reasoning-correctness classifier to verify intermediate reasoning traces with minimal labeled data. By employing entropy-based confidence thresholds to filter out unreliable samples, the approach achieves accuracy comparable to that obtained using 10-15 times more labeled data on tasks like Verifiable Math Problems and Question Answering on Image Scene Graphs. This method reduces the need for extensive annotation, enabling more efficient construction of reasoning resources and advancing the development of autonomous reasoning systems.
semi-supervisedLLMreasoning