Two-Stage Regularization-Based Structured Pruning for LLMs

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

ELDeR (Efficient LLMs through Data-Driven Regularized Layer-wise Pruning) is a novel pruning paradigm designed to reduce the computational and memory costs of Large Language Models (LLMs) without significant performance degradation. Unlike traditional prune-then-finetune methods that often require costly recovery fine-tuning (RFT), ELDeR employs a regularization-then-prune approach. It iteratively learns weights for each transformer layer using a small dataset, then applies $\ell_{1}$-norm or $\ell_{2}$-norm regularization to the difference between the input and output of layers with smaller weights. This process forces information transfer to remaining layers, minimizing loss. Experiments on LLaMA2-7B, LLaMA2-13B, LLaMA3-8B, OPT-2.7B, OPT-13B, and Phi-2 models show ELDeR achieves superior perplexity and accuracy compared to other layer-wise structured pruning methods like SLEB, ShortGPT, and LaCo, while significantly reducing RFT computational costs. For instance, ELDeR reduced LLaMA2-7B's perplexity by 20% compared to ShortGPT and achieved a 75% throughput increase and 46% latency reduction on OPT-13B at a 50% pruning ratio.

Key takeaway

For AI Engineers and Research Scientists optimizing LLM deployment, ELDeR offers a compelling alternative to traditional pruning. By adopting its regularization-then-prune paradigm, you can achieve significant model compression and acceleration (e.g., 1.75x throughput on OPT-13B) while maintaining high performance across generation and zero-shot tasks, often without the need for extensive recovery fine-tuning. This approach reduces computational overhead and enables more efficient resource utilization for deploying large models.

Key insights

Regularization before pruning can effectively transfer information, preserving LLM performance and reducing fine-tuning needs.

Principles

Method

ELDeR iteratively learns layer weights with $\ell_{1}$-norm loss, then applies $\ell_{1}$-norm or $\ell_{2}$-norm regularization to the input-output difference of low-weight layers, followed by pruning.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.