Training
RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories
RegMix-D introduces a dynamic data mixing approach for Large Language Model pretraining, enhancing the static mixture selection of its predecessor, RegMix. By utilizing full loss trajectories from proxy runs to inform a regression model, RegMix-D can predict optimal data mixtures at various training stages, offering both offline and online deployment modes. In experiments with a 1B parameter model on the Pile dataset, RegMix-D demonstrated superior performance over RegMix and DoReMi across 13 tasks while being more efficient with a reduced proxy compute budget.
data-mixingllmtraining