ai-digest.dev
last updated 2 h ago
TrainingarXiv cs.CL 14 d ago

RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories

RegMix-D introduces a dynamic data mixing approach for Large Language Model pretraining, enhancing the static mixture selection of its predecessor, RegMix. By utilizing full loss trajectories from proxy runs to inform a regression model, RegMix-D can predict optimal data mixtures at various training stages, offering both offline and online deployment modes. In experiments with a 1B parameter model on the Pile dataset, RegMix-D demonstrated superior performance over RegMix and DoReMi across 13 tasks while being more efficient with a reduced proxy compute budget.

data-mixingllmtrainingrelevance 0.00 · engagement 0.00
Read at source ↗← all news
RegMix-D: Dynamic Data Mixing via Proxy Training Trajectories — AI News Digest