Training
FastMix: Fast Data Mixture Optimization via Gradient Descent
FASTMIX is a new framework designed for optimizing data mixtures in the training of large models, automating the discovery of optimal data combinations while training a single proxy model. It reformulates mixture selection as a bilevel optimization problem, allowing for simultaneous optimization of mixture coefficients and model parameters through a gradient-based approach. This method significantly enhances efficiency and scalability, outperforming existing baselines in both pre-training and post-training scenarios while reducing the computational cost associated with data mixture search.
data optimizationmixturegradient descent