AgentsarXiv cs.AI — 9 d ago

PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents

The article introduces PACT, a Privileged Trace Co-Training framework designed for enhancing multi-turn tool-use agents by optimizing training with expert traces while maintaining prompt-only rollout generation. PACT employs a trace-conditioned reinforcement learning surrogate and a component-aware supervised fine-tuning loss to balance the benefits of expert guidance without constraining model trajectories. Experimental results demonstrate that PACT outperforms existing supervised fine-tuning and reinforcement learning baselines, indicating its potential for improving the training efficiency and performance of multi-turn tool-use agents.

tool-usereinforcement learningmulti-turnrelevance 0.00 · engagement 0.00

Read at source ↗← all news