The other paper that killed deep learning theory

2026-04-27 · Source: AI Alignment Forum · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences, Data Science & Analytics · Depth: Expert, long

Summary

The 2019 paper "Uniform convergence may be unable to explain generalization in deep learning" by Nagarajan and Kolter delivered a significant blow to deep learning theory, particularly the attempts to explain neural network generalization through data-dependent uniform convergence bounds. Following Zhang et al.'s 2016 finding that neural networks could memorize random data, researchers sought data-dependent bounds based on properties like spectral norm, weight norm, or margin. Nagarajan and Kolter empirically demonstrated that these proposed bounds were not only vacuous but scaled in the wrong direction, increasing as O(n) while actual generalization error decreased as O(1/√n) on MNIST. Furthermore, they theoretically constructed an overparameterized linear setting where two-sided uniform convergence bounds provably failed, and replicated this failure empirically with a shallow ReLU network on a hypersphere dataset. Their work suggests that deep learning generalization requires algorithm- and data-dependent explanations that move beyond worst-case empirical error bounding and account for both microscopic complexity and macroscopic simplicity.

Key takeaway

For research scientists developing deep learning theory, you should recognize that traditional uniform convergence bounds are insufficient to explain neural network generalization. Your efforts should shift towards developing highly algorithm- and data-dependent theories that can reconcile the observed microscopic complexity of decision boundaries with macroscopic generalization, moving beyond worst-case empirical error bounding to capture the nuanced behavior of SGD-trained networks.

Key insights

Uniform convergence bounds fail to explain deep learning generalization due to vacuous scaling and inability to capture microscopic complexity.

Principles

Generalization bounds must be strongly algorithm- and data-dependent.
Microscopic complexity can coexist with macroscopic simplicity.

Method

Nagarajan and Kolter empirically tested existing data-dependent generalization bounds, constructed a theoretical overparameterized linear model where uniform convergence fails, and demonstrated this failure empirically with a shallow ReLU network.

In practice

Avoid uniform convergence bounds for deep learning generalization.
Focus on algorithm- and data-specific generalization mechanisms.

Topics

Deep Learning Theory
Generalization Bounds
Uniform Convergence
Statistical Learning Theory
Spectral Norm

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Alignment Forum.