A Regularization-Sharpness Tradeoff for Linear Interpolators
Summary
This research introduces a "regularization-sharpness tradeoff" for overparameterized linear regression, extending the Interpolating Information Criterion (IIC) to models with $\ell^{p}$ penalties, where $p \geq 1$. The classical bias-variance tradeoff breaks down in overparameterized settings, necessitating new model selection principles. The proposed framework decomposes the IIC into a regularization term, which quantifies the alignment of the regularizer and the interpolator, and a geometric sharpness term, which measures the effect of local perturbations on the interpolating manifold. The study provides a general expression for the IIC for $\ell^{p}$ regularizers ($p \geq 2$) and extends this to the LASSO interpolator with an $\ell^{1}$ regularizer, which induces stronger sparsity. Empirical results using real-world datasets with random Fourier features and polynomials validate the theory, demonstrating that these tradeoff terms effectively distinguish performant linear interpolators from weaker ones and that $\ell^1$ regularization can lead to a more pronounced decrease in the sharpness term.
Key takeaway
Research scientists working with overparameterized linear models should adopt the Interpolating Information Criterion (IIC) and its regularization-sharpness tradeoff for model selection. This framework provides a more accurate assessment of model performance than traditional bias-variance approaches, especially when models perfectly interpolate training data. You should particularly investigate $\ell^1$ regularization, as it can lead to a more favorable tradeoff by significantly reducing the sharpness term, indicating better generalization in high-dimensional settings.
Key insights
A regularization-sharpness tradeoff replaces bias-variance in overparameterized linear models, decomposing model selection into alignment and local perturbation effects.
Principles
- Classical information criteria fail in overparameterized settings.
- IIC decomposes into regularization and sharpness terms.
- Sparsity-inducing $\ell^1$ regularization can significantly reduce sharpness.
Method
The Interpolating Information Criterion (IIC) is decomposed into regularization and sharpness terms. Bayesian duality is used to approximate marginal likelihoods for $\ell^p$ regularizers ($p \geq 2$) and $\ell^1$ (LASSO) interpolators, with empirical validation on datasets using random Fourier features and polynomials.
In practice
- Use IIC for model selection in overparameterized linear regression.
- Consider $\ell^1$ regularization for potentially better performance.
- Analyze regularization and sharpness terms to understand model generalization.
Topics
- Regularization-Sharpness Tradeoff
- Interpolating Information Criterion
- Overparameterized Models
- Lp Regularization
- LASSO Interpolators
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.