Mix-and-Match Pruning: Globally Guided Layer-Wise Sparsification of DNNs

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, long

Summary

Mix-and-Match Pruning is a novel framework designed to compress deep neural networks (DNNs) for edge device deployment by generating diverse, high-quality pruning configurations. It addresses the limitation of single-strategy pruning methods by leveraging globally guided, layer-wise sparsification. The framework operates in three phases: sensitivity analysis to assign architecture-aware sparsity ranges (e.g., 0% for normalization layers, [0%, 10%] for small layers, [15%, 30%] for Transformer patch embeddings), systematic sampling of these ranges to create ten distinct pruning strategies, and subsequent pruning with fine-tuning. This process yields multiple Pareto-optimal accuracy-sparsity trade-offs from a single pruning run, eliminating the need for repeated executions. Experiments on CNNs (VGG-11, ResNet-18) and Vision Transformers (LeViT-384, Swin-Tiny) demonstrate competitive or superior performance, with Mix-and-Match reducing accuracy degradation on Swin-Tiny by 40% relative to standard single-criterion pruning, while shrinking VGG-11 to ~10 MB and ResNet-18 to ~4.5 MB.

Key takeaway

For AI Engineers deploying DNNs on memory-constrained edge devices, Mix-and-Match Pruning offers a systematic way to achieve strong compression without extensive trial-and-error. You can generate multiple Pareto-optimal accuracy-sparsity configurations from a single pruning run, significantly reducing development time and computational cost compared to traditional methods. Consider applying its architecture-aware sparsity ranges to tailor pruning aggressiveness to specific layer types, ensuring robust performance while minimizing model size.

Key insights

Coordinating existing pruning signals through architecture-aware, layer-wise sparsification yields more efficient and reliable DNN compression.

Principles

Method

The framework computes sensitivity scores once, assigns architecture-aware sparsity ranges per layer, and then systematically samples these ranges to generate ten distinct pruning strategies for a single fine-tuning run.

In practice

Topics

Best for: AI Engineer, Computer Vision Engineer, AI Scientist, AI Researcher, Machine Learning Engineer, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.