Agents
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning
TRACE (Tree Rollout Allocation for Contrastive Exploration) is a new framework designed for efficient rollout budget allocation in reinforcement learning with verifiable rewards (RLVR), enhancing reasoning and agentic behavior in large language models. It introduces a tree-structured rollout approach that allocates budget not only to prompt roots but also to intermediate prefixes, improving reward contrast and policy-update signals. Empirically, TRACE demonstrates a 2.8-point accuracy improvement in Qwen3-14B Multi-Hop QA benchmarks at equal sampling costs, making it a significant advancement for practitioners focused on optimizing multi-turn agentic reinforcement learning strategies.
reinforcement learningbudget allocationagentic behavior