Why Every Weight in a Neural Network Is Born Divided by the Square Root of n.

· Source: Deep Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, quick

Summary

Neural network initialization is critical, as improper starting weights can cause signals to explode to infinity or vanish to nothing across layers, effectively "killing" the network before learning begins. This article demystifies the fundamental "divide by the square root of n" formula, a crucial deep learning concept, by illustrating its necessity through a step-by-step numerical simulation. It demonstrates how a signal propagates through a network, first showing how unscaled weights lead to instability, and then revealing how correctly scaled initial weights ensure stable signal propagation. The explanation aims to provide an intuitive understanding of why this specific scaling factor is essential for network stability and effective learning.

Key takeaway

For Machine Learning Engineers designing or debugging neural networks, understanding weight initialization is paramount. You should ensure your network's initial weights are scaled by dividing by the square root of n (number of inputs) to prevent immediate signal explosion or vanishing. This foundational practice directly impacts training stability and convergence, saving significant debugging time by addressing a common root cause of poor model performance before any data is processed.

Key insights

Proper neural network weight initialization prevents exploding or vanishing signals, ensuring stable learning from the start.

Principles

Method

The article demonstrates signal propagation through a neural network layer-by-layer using numerical examples, first with unscaled weights to show instability, then with scaled weights to illustrate stability.

In practice

Topics

Best for: AI Student, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Deep Learning on Medium.