Beyond Layer Importance in Layer-wise Sparsity: An Inter-Layer Perturbation-Absorption Perspective

2026-06-13 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study challenges traditional layer-wise sparsity allocation in Large Language Models (LLMs) by introducing an "inter-layer perturbation-absorption perspective." While current pruning methods often focus on local layer importance, this research emphasizes the network's subsequent compensatory capacity. Empirical findings reveal that LLM layers exhibit heterogeneous responses to pruning-scale perturbations: early layers amplify disturbances, whereas middle and late layers actively absorb them, with relative L2 drift decreasing monotonically across depth. This absorption phenomenon is significant under large perturbations, contrasting with amplification observed under small perturbations. Based on these insights, the authors define an absorption coefficient per layer and propose an "absorption-aware correction" method. This orthogonal augmentation significantly improves existing techniques like OWL and AlphaPruning, reducing perplexity by 7.13% and boosting zero-shot accuracy by 1.02% across various model families at 70% sparsity.

Key takeaway

For Machine Learning Engineers optimizing LLM compression, consider integrating an "absorption-aware correction" into your layer-wise sparsity strategies. This approach, which accounts for inter-layer perturbation absorption, can significantly improve model performance post-pruning. You can expect perplexity reductions of 7.13% and zero-shot accuracy boosts of 1.02% at 70% sparsity, enhancing efficiency without sacrificing critical capabilities.

Key insights

The network's compensatory capacity, not just local layer importance, dictates optimal layer-wise sparsity in LLMs.

Principles

Early LLM layers amplify pruning perturbations.
Middle and late layers absorb pruning perturbations.
Absorption is a large-perturbation phenomenon.

Method

Define an absorption coefficient per layer based on controlled perturbation experiments. Apply "absorption-aware correction" as an orthogonal augmentation to existing layer-wise pruning methods.

In practice

Augment OWL/AlphaPruning with absorption-aware correction.
Improve LLM perplexity and zero-shot accuracy.
Achieve 70% sparsity with better performance.

Topics

Large Language Models
LLM Compression
Model Pruning
Layer-wise Sparsity
Perturbation Absorption
Neural Network Optimization

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.