Generalization in Nonlinear Least Squares via Learned Feature Geometry

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

The paper "Generalization in Nonlinear Least Squares via Learned Feature Geometry" investigates generalization in ridge-regularized nonlinear least-squares models. It derives error bounds for local minimizers using on-average algorithmic stability, introducing a data-dependent effective dimension. This dimension captures the gradient model's geometry at trained parameters, incorporating the empirical Jacobian Gram matrix and a residual-curvature term. Unlike neural tangent kernel analyses, this effective dimension is evaluated at the trained model. The authors further bound this dimension through gradient feature covering complexity, providing guarantees based on learned geometry rather than parameter count. For manifold-supported data, bounds scale with intrinsic dimension, and for one-hidden-layer ReLU networks, the mechanism involves activation-stable regions. Experiments confirm trained-Jacobian compression and the bounds' agreement with observed generalization gaps. The derivation relies on first principles using the Brascamp--Lieb inequality.

Key takeaway

For AI Scientists evaluating nonlinear models, this research suggests moving beyond simple parameter counts for generalization guarantees. Your analysis should consider data-dependent effective dimensions that reflect learned feature geometry, such as the empirical Jacobian Gram matrix and residual-curvature terms. This approach offers a more nuanced understanding of model performance, especially for complex architectures like ReLU networks, by linking generalization directly to the data's intrinsic dimension and activation stability.

Key insights

New generalization bounds for nonlinear least-squares models depend on learned feature geometry and data-dependent effective dimension, not just parameter count.

Principles

Method

Error bounds are derived via on-average algorithmic stability, defining an effective dimension from the empirical Jacobian Gram matrix and a residual-curvature term. This dimension is bounded using gradient feature covering complexity.

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.