Agents
PACT: Privileged Trace Co-Training for Multi-Turn Tool-Use Agents
The article introduces PACT, a Privileged Trace Co-Training framework designed for enhancing multi-turn tool-use agents by optimizing training with expert traces while maintaining prompt-only rollout generation. PACT employs a trace-conditioned reinforcement learning surrogate and a component-aware supervised fine-tuning loss to balance the benefits of expert guidance without constraining model trajectories. Experimental results demonstrate that PACT outperforms existing supervised fine-tuning and reinforcement learning baselines, indicating its potential for improving the training efficiency and performance of multi-turn tool-use agents.
tool-usereinforcement learningmulti-turn