How does feature learning reshape the function space?

2026-05-19 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This work precisely characterizes how the function space spanned by features in a two-layer neural network evolves during gradient descent training. Researchers prove that in a high-dimensional proportional regime, after a large gradient step, the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. The analysis interprets feature learning as a distributional transformation in either parameter or input space, or equivalently, as the introduction of a target-dependent kernel. Specifically, it selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. For ReLU activation, the spike primarily transforms the top and linear eigenspaces, boosting the linear eigenfunction aligned with the target vector.

Key takeaway

For research scientists developing or analyzing neural network architectures, understanding how feature learning reshapes function space is crucial. This work demonstrates that gradient descent induces a data-adaptive kernel, selectively enhancing signal-aligned directions. You should consider this function-space evolution when designing training regimes, particularly the impact of step size on kernel transformation and the mixing of eigenfunctions, to potentially "skip" early training phases or improve model alignment with target signals.

Key insights

Feature learning in two-layer neural networks deforms function space via a data-adaptive, target-dependent kernel.

Principles

Gradient descent induces data-adaptive kernel deformation.
Feature learning amplifies target-aligned eigenvalues.
Larger step sizes induce stronger function space transformations.

Method

The method approximates post-update feature distribution with a target-dependent spiked Gaussian covariance, then analyzes the resulting data-adaptive kernel's spectral structure via Taylor expansion and eigenfunction analysis.

In practice

Consider data-adaptive kernels for neural network initialization.
Analyze eigenvalue amplification for model interpretability.

Topics

Feature Learning
Function Space Reshaping
Two-Layer Neural Networks
Gradient Descent Training
Data-Adaptive Kernels

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.