AgentsarXiv cs.CL — 14 d ago

Breaking the Solver Bottleneck: Training Task Generators at the Learnable Frontier

The article introduces PROPEL, a novel framework designed to optimize the training of task generators for reinforcement learning (RL) by addressing the bottleneck of generating solvable tasks. PROPEL employs a lightweight activation probe to predict the target-solver pass rate from a frozen generator model, allowing for efficient generator training without repeated solver rollouts. This approach significantly improves the generation of tasks at the learnable frontier, with notable increases in solvable task rates across various model sizes, such as from 10.1% to 20.0% for a Qwen2.5-3B-Instruct solver, enhancing the efficiency of RL training for coding and software engineering tasks.

reinforcement learningtask generationtrainingrelevance 0.00 · engagement 0.00

Read at source ↗← all news