Pre-Warm: Input-Conditioned Weight Initialization for Convolutional Neural Networks

2026-06-24 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, medium

Summary

Pre-Warm is a novel, zero-training-cost method for data-conditioned weight initialization of the first convolutional layer in neural networks. Before the initial forward pass, Pre-Warm processes a single training batch by extracting mean-centered local patches, clustering them with MiniBatchKMeans, and applying inverse Manhattan spatial weighting. The resulting centroids then initialize half of the first-layer filters, while the remaining filters use Kaiming initialization. The method includes closed-form rules for hyperparameters, with specific approaches for grayscale (Otsu's foreground density) and natural color images (mean L2 norm of mean-centered patches). Across five standard benchmarks—MNIST, Fashion-MNIST, CIFAR-10, SVHN, and CIFAR-100—and 8-seed paired experiments, Pre-Warm achieved statistically significant accuracy improvements over Kaiming initialization (p < 0.05 on all datasets, p = 0.0007 on SVHN with 8/8 wins, p = 0.0033 on CIFAR-100 with 7/8 wins). It adds negligible overhead and requires minimal code changes.

Key takeaway

For Machine Learning Engineers optimizing convolutional neural networks, Pre-Warm offers a simple, zero-cost way to improve model accuracy. If you are using Kaiming initialization, consider integrating Pre-Warm to condition your first layer's weights on input data. This method provides statistically significant performance gains across various image datasets without architectural changes or training overhead, allowing you to enhance optimization trajectories with just a few lines of code.

Key insights

Input-conditioned weight initialization for CNNs can significantly improve optimization trajectories with zero training cost.

Principles

Data-driven initialization can enhance CNN performance.
First-layer filters benefit from local patch information.

Method

Extract mean-centered patches from a single batch, cluster with MiniBatchKMeans, apply inverse Manhattan weighting, and initialize half of the first-layer filters with centroids.

In practice

Apply Pre-Warm to the first CNN layer.
Use Otsu's density for grayscale images.
Integrate with existing training pipelines.

Topics

Convolutional Neural Networks
Weight Initialization
Kaiming Initialization
MiniBatchKMeans
Image Classification
Optimization Trajectories

Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.