AgentsarXiv cs.CL — 16 d ago

SAGE-OPD: Selective Agent-Guided Intervention for Multi-Turn On-Policy Distillation

The article introduces SAGE-OPD, a selective intervention framework designed for multi-turn on-policy distillation (OPD) in language models. SAGE-OPD enhances student model training by selectively applying teacher supervision based on environmental feedback, weighing token-level distillation by teacher confidence, and normalizing loss to maintain overall loss scale. Experiments demonstrate that SAGE-OPD improves performance by up to 13.3% in ALFWorld unseen success rate compared to standard OPD, highlighting the importance of targeted teacher intervention in mitigating compounding errors during multi-turn interactions.

multi-turndistillationinterventionrelevance 0.00 · engagement 0.00

Read at source ↗← all news