The other paper that killed deep learning theory
Summary
The 2019 paper "Uniform convergence may be unable to explain generalization in deep learning" by Nagarajan and Kolter delivered a significant blow to deep learning theory, particularly the attempts to explain neural network generalization through data-dependent uniform convergence bounds. Following Zhang et al.'s 2016 finding that neural networks could memorize random data, researchers sought data-dependent bounds based on properties like spectral norm, weight norm, or margin. Nagarajan and Kolter empirically demonstrated that these proposed bounds were not only vacuous but scaled in the wrong direction, increasing as O(n) while actual generalization error decreased as O(1/√n) on MNIST. Furthermore, they theoretically constructed an overparameterized linear setting where two-sided uniform convergence bounds provably failed, and replicated this failure empirically with a shallow ReLU network on a hypersphere dataset. Their work suggests that deep learning generalization requires algorithm- and data-dependent explanations that move beyond worst-case empirical error bounding and account for both microscopic complexity and macroscopic simplicity.
Key takeaway
For research scientists developing deep learning theory, you should recognize that traditional uniform convergence bounds are insufficient to explain neural network generalization. Your efforts should shift towards developing highly algorithm- and data-dependent theories that can reconcile the observed microscopic complexity of decision boundaries with macroscopic generalization, moving beyond worst-case empirical error bounding to capture the nuanced behavior of SGD-trained networks.
Key insights
Uniform convergence bounds fail to explain deep learning generalization due to vacuous scaling and inability to capture microscopic complexity.
Principles
- Generalization bounds must be strongly algorithm- and data-dependent.
- Microscopic complexity can coexist with macroscopic simplicity.
Method
Nagarajan and Kolter empirically tested existing data-dependent generalization bounds, constructed a theoretical overparameterized linear model where uniform convergence fails, and demonstrated this failure empirically with a shallow ReLU network.
In practice
- Avoid uniform convergence bounds for deep learning generalization.
- Focus on algorithm- and data-specific generalization mechanisms.
Topics
- Deep Learning Theory
- Generalization Bounds
- Uniform Convergence
- Statistical Learning Theory
- Spectral Norm
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.