Why Every Weight in a Neural Network Is Born Divided by the Square Root of n.
Summary
Neural network initialization is critical, as improper starting weights can cause signals to explode to infinity or vanish to nothing across layers, effectively "killing" the network before learning begins. This article demystifies the fundamental "divide by the square root of n" formula, a crucial deep learning concept, by illustrating its necessity through a step-by-step numerical simulation. It demonstrates how a signal propagates through a network, first showing how unscaled weights lead to instability, and then revealing how correctly scaled initial weights ensure stable signal propagation. The explanation aims to provide an intuitive understanding of why this specific scaling factor is essential for network stability and effective learning.
Key takeaway
For Machine Learning Engineers designing or debugging neural networks, understanding weight initialization is paramount. You should ensure your network's initial weights are scaled by dividing by the square root of n (number of inputs) to prevent immediate signal explosion or vanishing. This foundational practice directly impacts training stability and convergence, saving significant debugging time by addressing a common root cause of poor model performance before any data is processed.
Key insights
Proper neural network weight initialization prevents exploding or vanishing signals, ensuring stable learning from the start.
Principles
- Unscaled initial weights cause signal instability.
- Stable signal propagation requires specific weight scaling.
- Intuitive understanding enhances formula retention.
Method
The article demonstrates signal propagation through a neural network layer-by-layer using numerical examples, first with unscaled weights to show instability, then with scaled weights to illustrate stability.
In practice
- Initialize weights using "divide by square root of n".
- Simulate signal flow to understand network dynamics.
Topics
- Neural Network Initialization
- Weight Initialization
- Exploding Gradients
- Vanishing Gradients
- Deep Learning Fundamentals
- Signal Propagation
Best for: AI Student, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Deep Learning on Medium.