Uniform-in-time concentration in two-layer neural networks via transportation inequalities

2026-03-03 · Source: cs.NE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new study quantifies the discrepancy between two-layer neural network predictions trained with stochastic gradient descent (SGD) and their mean-field limit. This analysis is uniform over time and holds with high probability, specifically for quadratic loss and ridge regularization. A key component of the research involves establishing T_p transportation inequalities (p \in {1, 2}) for the law of SGD parameters, featuring explicit constants independent of the iteration index. The study proves uniform-in-time concentration of the empirical parameter measure around its mean-field limit in the Wasserstein distance W_1, translating these bounds into prediction-error estimates against a fixed test function \Phi. Analogous concentration bounds are also derived in the sliced-Wasserstein distance SW_1, yielding dimension-free rates.

Key takeaway

For research scientists developing or analyzing two-layer neural networks, understanding the uniform-in-time concentration bounds can refine theoretical guarantees for SGD-trained models. Your work on prediction error estimates and dimension-free rates will benefit from applying these transportation inequality frameworks, potentially leading to more robust model analyses.

Key insights

The study quantifies neural network prediction discrepancies from mean-field limits using transportation inequalities.

Principles

T_p transportation inequalities apply to SGD parameter laws.
Uniform-in-time concentration bounds are achievable.

Method

The method involves establishing T_p transportation inequalities for SGD parameters, then proving uniform-in-time concentration of empirical parameter measures in Wasserstein and sliced-Wasserstein distances.

In practice

Quantify prediction error against a fixed test function.
Derive dimension-free rates using sliced-Wasserstein distance.

Topics

Two-layer Neural Networks
Stochastic Gradient Descent
Mean-Field Limit
Transportation Inequalities
Wasserstein Distance

Best for: Research Scientist, AI Researcher, AI Scientist, AI Student

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.NE updates on arXiv.org.