Scaling Laws and Spectra of Shallow Neural Networks in the Feature Learning Regime

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This work systematically analyzes scaling laws for quadratic and diagonal neural networks in the feature learning regime, moving beyond linear models. Drawing on connections with matrix compressed sensing and LASSO, the authors derive a detailed phase diagram for excess risk scaling exponents, showing crossovers and plateau behaviors consistent with empirical observations. The analysis establishes a precise link between these scaling regimes and the spectral properties of trained network weights, providing a theoretical validation for the emergence of power-law tails in weight spectra and their connection to generalization performance. Furthermore, the study demonstrates the non-asymptotic validity of approximate message passing (AMP) state evolution equations, extending their predictive power beyond traditional asymptotic assumptions through extensive numerical experiments.

Key takeaway

For research scientists designing or analyzing shallow neural networks, understanding the derived phase diagram of excess risk and weight spectra is crucial. You should carefully tune regularization strength based on sample complexity and network architecture to navigate distinct scaling regimes, avoid harmful overfitting, and achieve Bayes-optimal generalization. Consider implementing pruning strategies, as they can also attain optimal error rates without manual regularization adjustments.

Key insights

A universal theoretical framework for neural scaling laws and weight spectra in shallow networks is established via sparse estimation.

Principles

Neural scaling laws exhibit universal phase diagrams across network types.
Weight spectra directly reflect underfitting, overfitting, and approximation errors.
Optimal regularization avoids harmful overfitting and achieves Bayes-optimal rates.

Method

The method maps shallow neural network training with L2 weight decay to sparse vector (LASSO) and low-rank matrix (compressed sensing) estimation, then uses Approximate Message Passing (AMP) and state evolution for analysis.

In practice

Pruning learned weights can achieve Bayes-optimal error rates.
Tune regularization to avoid harmful overfitting.
Analyze weight spectra to diagnose underfitting/overfitting.

Topics

Neural Scaling Laws
Shallow Neural Networks
Feature Learning
Weight Spectra
Approximate Message Passing
Sparse Estimation
Regularization

Code references

SPOC-group/QuadraticNetPowerlaw

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.