Training
A Theory of Training Profit-Optimal LLMs
The paper presents a theoretical framework for optimizing the training of large language models (LLMs) based on economic principles, specifically focusing on the trade-offs between model size, training tokens, and associated costs. It establishes that in a compute-bound regime, the optimal model size and token budget should align with hardware efficiency, while in a data-bound regime, training expenditure scales quadratically with data availability and inversely with hardware efficiency. This model provides a basis for practitioners to make informed economic decisions regarding LLM training investments, highlighting the importance of balancing quality improvements with cost efficiency.
llmprofitscaling laws