Pre-Warm: Input-Conditioned Weight Initialization for Convolutional Neural Networks
Summary
Pre-Warm is a novel, zero-training-cost method for data-conditioned weight initialization of the first convolutional layer in neural networks. Before the initial forward pass, Pre-Warm processes a single training batch by extracting mean-centered local patches, clustering them with MiniBatchKMeans, and applying inverse Manhattan spatial weighting. The resulting centroids then initialize half of the first-layer filters, while the remaining filters use Kaiming initialization. The method includes closed-form rules for hyperparameters, with specific approaches for grayscale (Otsu's foreground density) and natural color images (mean L2 norm of mean-centered patches). Across five standard benchmarks—MNIST, Fashion-MNIST, CIFAR-10, SVHN, and CIFAR-100—and 8-seed paired experiments, Pre-Warm achieved statistically significant accuracy improvements over Kaiming initialization (p < 0.05 on all datasets, p = 0.0007 on SVHN with 8/8 wins, p = 0.0033 on CIFAR-100 with 7/8 wins). It adds negligible overhead and requires minimal code changes.
Key takeaway
For Machine Learning Engineers optimizing convolutional neural networks, Pre-Warm offers a simple, zero-cost way to improve model accuracy. If you are using Kaiming initialization, consider integrating Pre-Warm to condition your first layer's weights on input data. This method provides statistically significant performance gains across various image datasets without architectural changes or training overhead, allowing you to enhance optimization trajectories with just a few lines of code.
Key insights
Input-conditioned weight initialization for CNNs can significantly improve optimization trajectories with zero training cost.
Principles
- Data-driven initialization can enhance CNN performance.
- First-layer filters benefit from local patch information.
Method
Extract mean-centered patches from a single batch, cluster with MiniBatchKMeans, apply inverse Manhattan weighting, and initialize half of the first-layer filters with centroids.
In practice
- Apply Pre-Warm to the first CNN layer.
- Use Otsu's density for grayscale images.
- Integrate with existing training pipelines.
Topics
- Convolutional Neural Networks
- Weight Initialization
- Kaiming Initialization
- MiniBatchKMeans
- Image Classification
- Optimization Trajectories
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.