Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime
Summary
This work systematically analyzes scaling laws for quadratic and diagonal neural networks in the feature learning regime, moving beyond linear models. Drawing on connections with matrix compressed sensing and LASSO, the authors derive a detailed phase diagram for excess risk scaling exponents, showing crossovers and plateau behaviors consistent with empirical observations. The analysis establishes a precise link between these scaling regimes and the spectral properties of trained network weights, providing a theoretical validation for the emergence of power-law tails in weight spectra and their connection to generalization performance. Furthermore, the study demonstrates the non-asymptotic validity of approximate message passing (AMP) state evolution equations, extending their predictive power beyond traditional asymptotic assumptions through extensive numerical experiments.
Key takeaway
For research scientists designing or analyzing shallow neural networks, understanding the derived phase diagram of excess risk and weight spectra is crucial. You should carefully tune regularization strength based on sample complexity and network architecture to navigate distinct scaling regimes, avoid harmful overfitting, and achieve Bayes-optimal generalization. Consider implementing pruning strategies, as they can also attain optimal error rates without manual regularization adjustments.
Key insights
A universal theoretical framework for neural scaling laws and weight spectra in shallow networks is established via sparse estimation.
Principles
- Neural scaling laws exhibit universal phase diagrams across network types.
- Weight spectra directly reflect underfitting, overfitting, and approximation errors.
- Optimal regularization avoids harmful overfitting and achieves Bayes-optimal rates.
Method
The method maps shallow neural network training with L2 weight decay to sparse vector (LASSO) and low-rank matrix (compressed sensing) estimation, then uses Approximate Message Passing (AMP) and state evolution for analysis.
In practice
- Pruning learned weights can achieve Bayes-optimal error rates.
- Tune regularization to avoid harmful overfitting.
- Analyze weight spectra to diagnose underfitting/overfitting.
Topics
- Neural Scaling Laws
- Shallow Neural Networks
- Feature Learning
- Weight Spectra
- Approximate Message Passing
- Sparse Estimation
- Regularization
Code references
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.