Training
Small Experiments, Cheaper Decisions: A Case Study in Staged Promotion for Micro-Pretraining
The study presents a staged promotion protocol for micro-pretraining, utilizing fixed budgets of 2 minutes to 12 hours across different configurations on Windows A100 and Linux L40S. The findings indicate that early short pretraining runs can misrepresent model performance, with a significant emphasis on operational promotion evidence rather than seed-based performance curves. This approach, which resulted in a total of 169.2 GPU-hours spent, offers a cost-effective methodology for practitioners to validate configurations while minimizing resource expenditure, though it does not claim superiority over existing adaptive hyperparameter optimization techniques.
micro-pretrainingstaged promotionexperimental cost