TrainingarXiv cs.CL — 14 d ago

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

RegMix-D introduces a dynamic data mixing approach for Large Language Model pretraining, enhancing the static mixture selection of its predecessor, RegMix. By utilizing full loss trajectories from proxy runs to inform a regression model, RegMix-D can predict optimal data mixtures at various training stages, offering both offline and online deployment modes. In experiments with a 1B parameter model on the Pile dataset, RegMix-D demonstrated superior performance over RegMix and DoReMi across 13 tasks while being more efficient with a reduced proxy compute budget.

data-mixingllmtrainingrelevance 0.00 · engagement 0.00

Read at source ↗← all news