Occam's Razor is Only as Sharp as Your ELBO

2026-04-30 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, long

Summary

A study by Ethan Harvey and Michael C. Hughes from Tufts University demonstrates that the Evidence Lower Bound (ELBO) in variational inference, often considered a mathematical embodiment of Occam's razor, can lead to overfitting or underfitting depending on the assumed rank of the covariance matrix in a Gaussian approximate posterior. While prior work showed mean-field approximations causing underfitting, this research presents a clear case of ELBO-based overfitting in an over-parameterized Bayesian linear regression model. Specifically, using a rank-1 covariance matrix for the approximate posterior leads to overfitting by systematically underestimating the likelihood variance, especially when the number of parameters (R) exceeds the number of data points (N). Conversely, a diagonal covariance leads to underfitting. Surprisingly, Bayesian model selection via the exact log-marginal likelihood (LML) sometimes prefers the overfit option over the underfit one, a preference not shared by the ELBO.

Key takeaway

For research scientists developing scalable Bayesian models, you must carefully consider the implications of reduced-rank approximations for approximate posteriors. Your choice of covariance structure (e.g., diagonal vs. rank-1) directly impacts whether ELBO-based hyperparameter learning will underfit or overfit, potentially leading to suboptimal model selection. Be cautious, as the exact marginal likelihood might even favor an overfit model that the ELBO rejects.

Key insights

ELBO's model selection behavior depends critically on the approximate posterior's covariance structure.

Principles

ELBO can underfit with diagonal covariance.
ELBO can overfit with rank-1 covariance.
Exact LML may prefer overfit models.

Method

The study uses a Bayesian linear regression model with Gaussian approximate posteriors, varying covariance matrix ranks (diagonal, rank-1, full-rank) to observe ELBO behavior.

In practice

Low-rank approximations can cause ELBO overfitting.
Consider tempered VI to prevent ELBO overfitting.
IWELBO may not prevent overfitting with rank-1 covariance.

Topics

Variational Inference
Evidence Lower Bound
Bayesian Model Selection
Approximate Posterior
Rank-1 Covariance

Code references

ethanharvey98/overfitting-ELBO

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.