Training
Fragile Knowledge, Robust Instruction-Following: The Width Pruning Dichotomy in Llama-3.2
The article discusses the findings from structured width pruning of GLU-MLP layers in Llama-3.2 models, specifically using the Peak-to-Peak Magnitude (PPM) criterion. It reveals that while reducing the expansion ratio negatively impacts parametric knowledge tasks, it enhances instruction-following capabilities at a 2.4x equilibrium ratio, with notable performance improvements in Llama-3.2-1B and Llama-3.2-3B. This research highlights the importance of the expansion ratio as a critical architectural factor that can selectively influence model performance, offering insights for practitioners on optimizing LLMs for specific tasks.
llamapruningfine-tuning