Occam's Razor is Only as Sharp as Your ELBO
Summary
A study by Ethan Harvey and Michael C. Hughes from Tufts University demonstrates that the Evidence Lower Bound (ELBO) in variational inference, often considered a mathematical embodiment of Occam's razor, can lead to overfitting or underfitting depending on the assumed rank of the covariance matrix in a Gaussian approximate posterior. While prior work showed mean-field approximations causing underfitting, this research presents a clear case of ELBO-based overfitting in an over-parameterized Bayesian linear regression model. Specifically, using a rank-1 covariance matrix for the approximate posterior leads to overfitting by systematically underestimating the likelihood variance, especially when the number of parameters (R) exceeds the number of data points (N). Conversely, a diagonal covariance leads to underfitting. Surprisingly, Bayesian model selection via the exact log-marginal likelihood (LML) sometimes prefers the overfit option over the underfit one, a preference not shared by the ELBO.
Key takeaway
For research scientists developing scalable Bayesian models, you must carefully consider the implications of reduced-rank approximations for approximate posteriors. Your choice of covariance structure (e.g., diagonal vs. rank-1) directly impacts whether ELBO-based hyperparameter learning will underfit or overfit, potentially leading to suboptimal model selection. Be cautious, as the exact marginal likelihood might even favor an overfit model that the ELBO rejects.
Key insights
ELBO's model selection behavior depends critically on the approximate posterior's covariance structure.
Principles
- ELBO can underfit with diagonal covariance.
- ELBO can overfit with rank-1 covariance.
- Exact LML may prefer overfit models.
Method
The study uses a Bayesian linear regression model with Gaussian approximate posteriors, varying covariance matrix ranks (diagonal, rank-1, full-rank) to observe ELBO behavior.
In practice
- Low-rank approximations can cause ELBO overfitting.
- Consider tempered VI to prevent ELBO overfitting.
- IWELBO may not prevent overfitting with rank-1 covariance.
Topics
- Variational Inference
- Evidence Lower Bound
- Bayesian Model Selection
- Approximate Posterior
- Rank-1 Covariance
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.