Finding Sparse Subnetworks in One Training Cycle via Progressive Magnitude-Based Pruning
Summary
A new method, progressive magnitude-based pruning, offers a single-cycle alternative for neural network sparsification, addressing the multi-cycle training requirement of approaches like the Lottery Ticket Hypothesis (LTH). This technique gradually increases sparsity during training via a linear schedule, updating pruning masks based on active weight magnitudes. Systematic experiments on CIFAR-10 and MNIST datasets, utilizing ResNet, VGG-style, and LeNet architectures, demonstrate its effectiveness. On CIFAR-10, the method achieved 95.12% accuracy on ResNet-18 at 72.9% sparsity, outperforming LTH's reported 90.5%. At extreme sparsity, it reached 93.13% accuracy on a VGG-like architecture at 97% sparsity, surpassing SNIP's approximately 92.0%, and 93.44% accuracy on VGG-19 at 97.97% sparsity, compared to GraSP's 92.19% at 98% sparsity. Accuracy on ResNet-18 remained within 0.1 percentage points of the dense baseline across 70-85% sparsity.
Key takeaway
For Machine Learning Engineers optimizing model deployment, progressive magnitude-based pruning offers a significant efficiency gain. You can achieve high model sparsity and maintain competitive accuracy in a single training cycle, eliminating the computational overhead of iterative pruning methods. This approach allows you to streamline your model development workflow, reducing training time and resource consumption for deploying compact, performant neural networks. Consider integrating this technique to accelerate your sparsification efforts.
Key insights
Progressive magnitude-based pruning enables effective neural network sparsification within a single training cycle.
Principles
- Iterative pruning methods often demand multiple training cycles.
- Sparsity can be increased progressively during a single training run.
- Weight magnitudes can guide dynamic pruning mask updates.
Method
Sparsity is gradually increased via a linear schedule throughout training, with pruning masks dynamically updated based on the magnitudes of active weights.
In practice
- Apply single-cycle pruning to ResNet, VGG, and LeNet architectures.
- Achieve 70-97% sparsity while preserving accuracy on image classification tasks.
Topics
- Neural Network Pruning
- Model Sparsification
- Progressive Pruning
- Lottery Ticket Hypothesis
- Deep Learning Architectures
- Model Compression
Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.