Training
Beyond Layer Importance in Layer-wise Sparsity: An Inter-Layer Perturbation-Absorption Perspective
This paper introduces a novel approach to layer-wise sparsity in large language models (LLMs) by examining inter-layer perturbation absorption. It empirically demonstrates that early layers amplify perturbations while middle and late layers absorb them, leading to a defined absorption coefficient per layer. This insight allows for the development of absorption-aware correction, which enhances existing pruning methods like OWL and AlphaPruning, achieving a 7.13% reduction in perplexity and a 1.02% improvement in zero-shot accuracy at 70% sparsity, providing practitioners with a more effective strategy for model compression.
llmsparsitypruning