AgentsarXiv cs.AI — 47 d ago

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

TRACE (Tree Rollout Allocation for Contrastive Exploration) is a new framework designed for efficient rollout budget allocation in reinforcement learning with verifiable rewards (RLVR), enhancing reasoning and agentic behavior in large language models. It introduces a tree-structured rollout approach that allocates budget not only to prompt roots but also to intermediate prefixes, improving reward contrast and policy-update signals. Empirically, TRACE demonstrates a 2.8-point accuracy improvement in Qwen3-14B Multi-Hop QA benchmarks at equal sampling costs, making it a significant advancement for practitioners focused on optimizing multi-turn agentic reinforcement learning strategies.

reinforcement learningbudget allocationagentic behaviorrelevance 0.60 · engagement 0.00

Read at source ↗← all news