Training
EvoTrainer: Co-Evolving LLM Policies and Training Harnesses for Autonomous Agentic Reinforcement Learning
EvoTrainer is a novel autonomous training framework designed for co-evolving large language model (LLM) policies and their training harnesses, addressing the limitations of static training setups in agentic reinforcement learning (RL). It employs empirical feedback to diagnose and revise training strategies, demonstrating superior performance in mathematical reasoning, competitive programming, and software engineering tasks compared to traditional human-engineered RL methods. This approach emphasizes the need for a dynamic evolution of both policies and training environments, which can significantly enhance the effectiveness of LLMs in complex, long-horizon tasks.
reinforcement-learningautonomous-agentsllm