Bittensor Agent Arenas as a Trajectory Primitive: Distilling a Shopping Agent from ShoppingBench Subnet Traces
The paper presents the Bittensor Agent Arenas framework, which facilitates the generation of incentive-aligned trajectories for post-training small-model agents, specifically demonstrated using the ShoppingBench agentic-commerce benchmark on ORO Subnet 15 (SN15). The authors introduce a structural-quality filter that enhances the training data by retaining agentic trajectories while filtering out sub-task trajectories, resulting in a significant improvement in the Qwen3-4B model's accuracy from 18.0% to 42.7% ASR, closely approaching the synthetic-data SFT-only baseline of 43.6%. This work is crucial for practitioners as it provides new methodologies and data sources to improve agent performance in multi-turn tasks, addressing the limitations of current trajectory datasets.