Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The paper "Characterization of Gaussian Universality Breakdown in High-Dimensional Empirical Risk Minimization" studies high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min–Max Theorem (CGMT) to non-Gaussian settings, it derives an asymptotic min–max characterization of key statistics, enabling approximation of the mean μθ̂ and covariance Cθ̂ of the ERM estimator θ̂. Specifically, for a test covariate x, the projection θ̂∤x approximately follows a convolution of the (generally non-Gaussian) distribution of μθ̂∤x with an independent centered Gaussian variable. This clarifies the scope and limits of Gaussian universality for ERMs, demonstrating its breakdown in cases like bimodal features (Figure 1). The work also proves that any ℂ² regularizer is asymptotically equivalent to a quadratic form. Numerical simulations across diverse losses and models validate these theoretical predictions.

Key takeaway

For AI Scientists and Research Scientists developing high-dimensional models, you should critically evaluate Gaussian universality assumptions, especially with non-Gaussian data. This work provides a framework to predict when these assumptions break down, impacting performance metrics like classification error. Use the derived fixed-point equations to precisely characterize estimator behavior and test score distributions, moving beyond Gaussian proxies for more accurate model performance predictions.

Key insights

Gaussian universality in high-dimensional ERM is not universal; its breakdown can be precisely characterized for non-Gaussian data.

Principles

Method

The paper extends CGMT to non-Gaussian designs, reformulating ERM as a min-max problem. It then derives a fixed-point system to characterize asymptotic mean μ∗ and variance α∗ of the estimator.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.