Training
OPRD: On-Policy Representation Distillation
The paper presents On-Policy Representation Distillation (OPRD), a novel approach that enhances on-policy distillation by aligning student and teacher representations in hidden-state space across selected layers, rather than relying solely on output probabilities. This method significantly reduces sampling variance and improves training efficiency, achieving a 1.44x speedup and 54% lower memory usage compared to top-k on-policy distillation methods. OPRD demonstrates superior performance on benchmarks such as AIME 2024/2025 and AIMO, making it a valuable technique for practitioners aiming to improve model training and performance in large language models.
distillationrepresentationtrainingmodels