AgentsarXiv cs.CL — 8 d ago

CacheRL:Multi-Turn Tool-Calling Agents via Cached Rollouts and Hybrid Reward

CacheRL is a new system designed for training small agent foundation models, achieving 92% process accuracy on multi-step tool-calling tasks, closely rivaling GPT-5's 94% while utilizing 100 times less compute. Key innovations include a hybrid trajectory pipeline that incorporates LLM-generated reasoning, a CacheAgentLoop for eliminating live execution costs through a fuzzy cache, and a cache-tier-aware reward system that dynamically adjusts weights based on answer quality. This work highlights the importance of knowledge transfer and reward design in enhancing model performance, suggesting that effective data quality and reward structures are critical for developing efficient small agent models.

agentstool-callingreinforcement-learningrelevance 0.00 · engagement 0.00

Read at source ↗← all news