TrainingMarkTechPost — 13 d ago

Prime Intellect Releases prime-rl 0.6.0 to Train Trillion-Parameter MoE Models on Agentic RL Workloads

Prime Intellect has released prime-rl 0.6.0, an open framework designed for asynchronous reinforcement learning on trillion-parameter Mixture-of-Experts (MoE) models. The framework successfully trained the GLM-5 model on SWE tasks with a maximum sequence length of 131k, achieving sub-5-minute step times and utilizing 256 rollouts across 28 H200 nodes. Key optimizations include FP8 inference, Wide Expert Parallelism, and various forms of parallelism (FSDP, EP, CP), which enhance both training efficiency and model performance, making it a significant tool for practitioners working with large-scale RL applications.

prime_intellectreinforcement_learningmoerelevance 0.00 · engagement 0.00

Read at source ↗← all news