ai-digest.dev
last updated 4 h ago
TrainingarXiv cs.AI 10 d ago

AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining

The paper introduces Actor-Critic Online Data Mixing (AC-ODM), a novel approach to LLM pretraining that leverages reinforcement learning for dynamic data mixing. AC-ODM supports two modes: a proxy mode for transferring learned policies from smaller models to larger ones and a non-proxy mode for training from scratch. Empirical results demonstrate that AC-ODM achieves optimal validation perplexity on the Pythia-1B model with 66% fewer training steps than existing methods, yielding a 27.5% improvement in MMLU accuracy and a 2.23x higher pass@1 on HumanEval, all while maintaining minimal computational overhead.

LLMpretrainingreinforcement-learningrelevance 0.00 · engagement 0.00
Read at source ↗← all news
AC-ODM: Actor--Critic Online Data Mixing for Sample-Efficient LLM Pretraining — AI News Digest