Training
Small LLMs: Pruning vs. Training from Scratch
The study evaluates the effectiveness of pruning versus training from scratch for small language models, specifically analyzing Llama-3.1-8B with pruning ratios of 0.5 to 0.8. Results indicate that pruned models consistently outperform those initialized randomly, particularly under limited training token budgets, while finer granularities of pruning retain advantages even when the full token budget is available. These findings suggest that leveraging a large pretrained model through pruning is generally more effective for practitioners with constrained resources, while training from scratch can be viable with sufficient training data when using coarser pruning methods.
pruningtrainingllmsmall models