TrainingarXiv cs.AI — 8 d ago

On the Optimal Reasoning Length for RL-Trained Language Models

The paper investigates the impact of output length on the accuracy of reinforcement learning (RL)-trained language models, revealing a non-monotonic relationship where accuracy peaks at an intermediate output length. Through controlled experiments on various base models, it demonstrates that while sample accuracy may plateau or decline, mode accuracy consistently improves with longer outputs. This finding is critical for practitioners as it suggests optimal output length strategies that can enhance reasoning performance without incurring excessive computational costs.

reinforcement learningreasoninglengthrelevance 0.00 · engagement 0.00

Read at source ↗← all news