Dropout Neural Network Training Viewed from a Percolation Perspective

2024-11-26 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A new study investigates the existence and effect of "percolation" in training deep Neural Networks (NNs) with dropout, a regularization technique introduced by G. Hinton et al. (2012). The research models dropout's random connection removal using new percolation models for rectangular layered networks, distinguishing between bond percolation (dropconnect) and site percolation (original dropout). It characterizes the relationship between network topology (depth L, width W) and the probability of a path existing between input and output layers, establishing critical behavior. The theory demonstrates that this percolative effect can cause a breakdown in training NNs without biases, preventing learning, and heuristically extends this breakdown to NNs with biases. Specifically, for deep networks, the required training steps T(n) to avoid this issue can grow exponentially or even doubly exponentially with depth.

Key takeaway

For AI Scientists designing or training deep neural networks, this research highlights a critical "percolation problem" where excessively deep networks, especially those without biases, can fail to learn due to insufficient input-output paths during dropout. You should carefully consider the network's width-to-depth ratio and adjust training duration. For very deep networks, be prepared for exponentially or even "doubly exponentially" longer training times to ensure effective learning and avoid parameter stagnation.

Key insights

Dropout's random connection removal in deep NNs can lead to a "percolation problem" where input-output paths vanish, preventing learning.

Principles

Dropout's random connection filtering is analogous to statistical physics' percolation.
Deep NNs with specific width-to-depth ratios exhibit critical percolation behavior.
Absence of input-output paths during dropout training renders gradient estimates zero, halting learning.

Method

New rectangular layered network percolation models (bond for dropconnect, site for original dropout) are defined to characterize the crossing probability and its impact on NN training.

In practice

Adjust dropout probability 'p' to mitigate the risk of critical percolation breakdown.
Increase training steps 'T(n)' exponentially or doubly exponentially for very deep networks.
Prioritize wider networks over excessively deep ones to maintain connectivity during dropout.

Topics

Neural Networks
Dropout Regularization
Percolation Theory
Stochastic Gradient Descent
Deep Learning Training
Network Topology

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.