ai-digest.dev
last updated 5 h ago
AgentsarXiv cs.AI 21 h ago

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

TRACE (Tree Rollout Allocation for Contrastive Exploration) is a new framework designed for efficient rollout budget allocation in reinforcement learning with verifiable rewards (RLVR), enhancing reasoning and agentic behavior in large language models. It introduces a tree-structured rollout approach that allocates budget not only to prompt roots but also to intermediate prefixes, improving reward contrast and policy-update signals. Empirically, TRACE demonstrates a 2.8-point accuracy improvement in Qwen3-14B Multi-Hop QA benchmarks at equal sampling costs, making it a significant advancement for practitioners focused on optimizing multi-turn agentic reinforcement learning strategies.

reinforcement learningbudget allocationagentic behaviorrelevance 0.00 · engagement 0.00
Read at source ↗← all news
TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning — AI News Digest