Uniform-in-time concentration in two-layer neural networks via transportation inequalities
Summary
A new study quantifies the discrepancy between two-layer neural network predictions trained with stochastic gradient descent (SGD) and their mean-field limit. This analysis is uniform over time and holds with high probability, specifically for quadratic loss and ridge regularization. A key component of the research involves establishing T_p transportation inequalities (p \in {1, 2}) for the law of SGD parameters, featuring explicit constants independent of the iteration index. The study proves uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W_1, translating these bounds into prediction-error estimates against a fixed test function \Phi. Analogous concentration bounds are also derived in the sliced-Wasserstein distance SW_1, yielding dimension-free rates.
Key takeaway
For research scientists developing or analyzing two-layer neural networks, understanding the uniform-in-time concentration bounds can refine theoretical guarantees for SGD-trained models. Your work on prediction error estimates and dimension-free rates will benefit from applying these transportation inequality frameworks, potentially leading to more robust model analyses.
Key insights
The study quantifies neural network prediction discrepancies from mean-field limits using transportation inequalities.
Principles
- T_p transportation inequalities apply to SGD parameter laws.
- Uniform-in-time concentration bounds are achievable.
Method
The method involves establishing T_p transportation inequalities for SGD parameters, then proving uniform-in-time concentration of empirical parameter measures in Wasserstein and sliced-Wasserstein distances.
In practice
- Quantify prediction error against a fixed test function.
- Derive dimension-free rates using sliced-Wasserstein distance.
Topics
- Two-layer Neural Networks
- Stochastic Gradient Descent
- Mean-Field Limit
- Transportation Inequalities
- Wasserstein Distance
Best for: Research Scientist, AI Researcher, AI Scientist, AI Student
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.NE updates on arXiv.org.