RollArt: Disaggregated Multi-Task Agentic RL Training at Scale
ROLLART is a new system for multi-task agentic reinforcement learning (RL) that optimizes training by disaggregating stages across specialized hardware, routing tasks to compute-optimized GPUs for prefill, bandwidth-optimized GPUs for decoding, and CPU clusters for environment execution. This architecture allows for independent processing of generation, environment interaction, and reward scoring, significantly reducing synchronization overhead and improving training throughput, achieving a 1.31–2.05x reduction in training time. The system successfully trained a hundreds-of-billions-parameter mixture of experts (MoE) model on an Alibaba cluster with over 3,000 GPUs, highlighting its scalability and efficiency for practitioners working with large-scale RL systems.