CRePE: Convolution-aware Relative Importance in Post-training Pruning with Efficient Search

2026-06-01 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, medium

Summary

CRePE is a novel post-training pruning (PTP) method designed to reduce the substantial memory and computational costs of Large Language Models (LLMs). It enhances existing Relative Importance scoring (RIA) by integrating 2D local neighborhood context and adaptive coefficients, moving beyond RIA's 1D cross-shaped directional information. CRePE consistently surpasses other PTP methods across various models and sparsity configurations. A key challenge, however, is the 11-hour search time required for optimal adaptive coefficients using perplexity (PPL)-based hill climbing. To address this, the paper introduces PHO (Proxy-based Hyperparameter Optimization), which cuts the search time to approximately 20 minutes by avoiding repeated PPL measurements. PHO also demonstrates strong generalization, as optimal hyperparameters transfer effectively between models. CRePE is also verified to combine orthogonally with techniques like Channel Permutation, non-uniform sparsity allocation, and re-pruning.

Key takeaway

For MLOps Engineers deploying Large Language Models, CRePE offers a significant advancement in post-training pruning efficiency. You should consider integrating CRePE to reduce memory and computational costs, especially utilizing PHO for rapid hyperparameter tuning. This approach allows for faster optimization and strong generalization across different models, streamlining LLM deployment workflows.

Key insights

CRePE improves LLM post-training pruning by using 2D context and adaptive coefficients, with PHO accelerating hyperparameter optimization.

Principles

2D local context enhances pruning importance scores.
Adaptive coefficients improve pruning performance.
Proxy-based optimization generalizes hyperparameter search.

Method

CRePE incorporates 2D local neighborhood context and adaptive coefficients into Relative Importance scoring for LLM pruning. PHO optimizes these coefficients by proxy, reducing search time from 11 hours to 20 minutes.

In practice

Apply CRePE for efficient LLM deployment.
Use PHO to quickly tune CRePE hyperparameters.
Combine CRePE with Channel Permutation.

Topics

Large Language Models
Model Pruning
Post-training Pruning
Hyperparameter Optimization
Neural Network Compression
LLM Deployment

Code references

dvlab-research/FocalsConv

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.