TrainingarXiv cs.CL — 7 d ago

Multi-Turn Reasoning When Context Arrives in Pieces: Scalable Sharding and Memory-Augmented RL

The paper introduces a scalable sharding pipeline for training models on multi-turn fragmented-information episodes, addressing the issue of accuracy degradation in LLMs when context is provided in pieces. By utilizing a memory-augmented reinforcement learning (RL) approach on the sharded GSM8K dataset, the authors demonstrate significant improvements in multi-turn accuracy and zero-shot generalization for complex math problems and long-context QA. This method suggests that maintaining a compact rolling memory enhances incremental reasoning capabilities, offering a more effective alternative to traditional full-history context training for AI practitioners.

memory-augmentedmulti-turnreasoningrelevance 0.00 · engagement 0.00

Read at source ↗← all news