TrainingarXiv cs.AI — 9 d ago

On-Policy Distillation with Curriculum Turn-level Guidance for Multi-turn Agents

The paper introduces Guided On-Policy Distillation (Guided-OPD), an algorithm designed to enhance the performance of smaller multi-turn agents by mitigating the compounding errors that arise during on-policy distillation. By mixing teacher- and student-generated turns and gradually reducing teacher intervention, the approach maintains trajectory alignment with the teacher's state distribution. Evaluated on ALFWorld, ScienceWorld, and WebShop, Guided-OPD demonstrates significant improvements, achieving a 21.1% increase in Score and a 25.5% increase in Success Rate when distilling Qwen3 students from a Qwen3-30B-A3B teacher, indicating its potential for reducing inference costs while maintaining performance in practical applications.

agentsdistillationmulti-turnrelevance 0.00 · engagement 0.00

Read at source ↗← all news