Deep Learning as Neural Low-Degree Filtering: A Spectral Theory of Hierarchical Feature Learning

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Neural Low-Degree Filtering (Neural LoFi) is a novel theoretical framework that models deep neural network learning as an iterative spectral procedure, moving beyond the lazy regime. It posits that hierarchical feature learning occurs layer by layer, where each layer selects directions with maximal low-degree correlation to the label. This approach offers a tractable surrogate for deep learning, providing a kernel-space interpretation and explaining how representations are selected, concepts emerge with specific sample complexities, and depth constructs new features through low-degree compositionality. Mechanistic experiments on fully connected and convolutional architectures demonstrate that Neural LoFi outperforms lazy random-feature baselines, recovers structured filters, and aligns with early gradient-descent feature discovery on real datasets like CIFAR-10 and CelebA. The framework also provides a data-driven criterion for feature emergence based on the residual effective dimension of the current kernel.

Key takeaway

For AI Scientists and Research Scientists aiming to understand or optimize deep learning's internal mechanisms, Neural LoFi offers a powerful, interpretable lens. Its layer-wise spectral filtering and low-degree compositionality principles explain how complex features emerge and why depth is computationally advantageous. You should consider Neural LoFi's insights for designing more efficient architectures, particularly for initialization or feature pretraining, and utilize its emergence criterion to predict when new concepts become learnable with increasing data.

Key insights

Neural LoFi explains deep learning's hierarchical feature acquisition via iterative, label-correlated spectral filtering.

Principles

Method

Neural LoFi iteratively projects current representations onto leading eigen-directions of a label-weighted moment operator, then lifts these features via a nonlinear random feature map to form the next layer's input.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.