Beyond Layer Importance in Layer-wise Sparsity: An Inter-Layer Perturbation-Absorption Perspective
Summary
A new study challenges traditional layer-wise sparsity allocation in Large Language Models (LLMs) by introducing an "inter-layer perturbation-absorption perspective." While current pruning methods often focus on local layer importance, this research emphasizes the network's subsequent compensatory capacity. Empirical findings reveal that LLM layers exhibit heterogeneous responses to pruning-scale perturbations: early layers amplify disturbances, whereas middle and late layers actively absorb them, with relative L2 drift decreasing monotonically across depth. This absorption phenomenon is significant under large perturbations, contrasting with amplification observed under small perturbations. Based on these insights, the authors define an absorption coefficient per layer and propose an "absorption-aware correction" method. This orthogonal augmentation significantly improves existing techniques like OWL and AlphaPruning, reducing perplexity by 7.13% and boosting zero-shot accuracy by 1.02% across various model families at 70% sparsity.
Key takeaway
For Machine Learning Engineers optimizing LLM compression, consider integrating an "absorption-aware correction" into your layer-wise sparsity strategies. This approach, which accounts for inter-layer perturbation absorption, can significantly improve model performance post-pruning. You can expect perplexity reductions of 7.13% and zero-shot accuracy boosts of 1.02% at 70% sparsity, enhancing efficiency without sacrificing critical capabilities.
Key insights
The network's compensatory capacity, not just local layer importance, dictates optimal layer-wise sparsity in LLMs.
Principles
- Early LLM layers amplify pruning perturbations.
- Middle and late layers absorb pruning perturbations.
- Absorption is a large-perturbation phenomenon.
Method
Define an absorption coefficient per layer based on controlled perturbation experiments. Apply "absorption-aware correction" as an orthogonal augmentation to existing layer-wise pruning methods.
In practice
- Augment OWL/AlphaPruning with absorption-aware correction.
- Improve LLM perplexity and zero-shot accuracy.
- Achieve 70% sparsity with better performance.
Topics
- Large Language Models
- LLM Compression
- Model Pruning
- Layer-wise Sparsity
- Perturbation Absorption
- Neural Network Optimization
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.