TrainingarXiv cs.CL — 14 d ago

Beyond Reward Engineering: A Data Recipe for Long-Context Reinforcement Learning

The article presents a new data-centric approach to enhance long-context reasoning in reinforcement learning (RL) for large language models, introducing a data recipe that includes eight curated datasets with approximately 14,000 examples. Experiments on models Qwen3-4B/8B/30B-A3B show significant benchmark improvements, with average gains of +7.2, +3.2, and +6.4 points across seven long-context benchmarks, and further enhancements in agentic tasks, improving GAIA by +4.8 and BrowseComp by +7.0 points. The release of these datasets aims to support further research and development in this area, emphasizing the importance of diverse training data over traditional reward engineering.

reinforcement-learninglong-contextdatarelevance 0.00 · engagement 0.00

Read at source ↗← all news