How does feature learning reshape the function space?
Summary
This work precisely characterizes how the function space spanned by features in a two-layer neural network evolves during gradient descent training. Researchers prove that in a high-dimensional proportional regime, after a large gradient step, the post-update feature distribution is well approximated by a target-dependent spiked Gaussian covariance. This induces a data-adaptive kernel that reshapes the function space and modifies its spectral structure. The analysis interprets feature learning as a distributional transformation in either parameter or input space, or equivalently, as the introduction of a target-dependent kernel. Specifically, it selectively amplifies eigenvalues aligned with the target direction and mixes leading eigenfunctions, coupling the top radial mode with a target-aligned quadratic harmonic. For ReLU activation, the spike primarily transforms the top and linear eigenspaces, boosting the linear eigenfunction aligned with the target vector.
Key takeaway
For research scientists developing or analyzing neural network architectures, understanding how feature learning reshapes function space is crucial. This work demonstrates that gradient descent induces a data-adaptive kernel, selectively enhancing signal-aligned directions. You should consider this function-space evolution when designing training regimes, particularly the impact of step size on kernel transformation and the mixing of eigenfunctions, to potentially "skip" early training phases or improve model alignment with target signals.
Key insights
Feature learning in two-layer neural networks deforms function space via a data-adaptive, target-dependent kernel.
Principles
- Gradient descent induces data-adaptive kernel deformation.
- Feature learning amplifies target-aligned eigenvalues.
- Larger step sizes induce stronger function space transformations.
Method
The method approximates post-update feature distribution with a target-dependent spiked Gaussian covariance, then analyzes the resulting data-adaptive kernel's spectral structure via Taylor expansion and eigenfunction analysis.
In practice
- Consider data-adaptive kernels for neural network initialization.
- Analyze eigenvalue amplification for model interpretability.
Topics
- Feature Learning
- Function Space Reshaping
- Two-Layer Neural Networks
- Gradient Descent Training
- Data-Adaptive Kernels
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.