Gaussian Mean Field Variational Inference can Overestimate Predictive Variance

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

James Odgers, Ben Riegler, Siddharth Swaroop, and Vincent Fortuin demonstrate that Mean Field Variational Inference (MFVI), commonly believed to underestimate posterior variance, can paradoxically overestimate predictive variance. Through an analysis of conjugate Bayesian Linear Regression (BLR), the authors reveal that while MFVI indeed underestimates variance in parameter space, it can yield higher predictive variances than the exact posterior. This overestimation is particularly pronounced in directions where training data is concentrated. A surprising result is that for test points drawn from the training distribution, MFVI's expected predictive variance surpasses that of the exact posterior. The paper illustrates a pathological scenario where MFVI fails to reduce predictive variance compared to the prior for in-distribution data. The authors link these findings to the Cold Posterior Effect, proposing that adjusting the temperature can mitigate this overestimation, bringing predictions closer to the exact posterior, and validate their theory on synthetic and real-world regression tasks.

Key takeaway

For Machine Learning Engineers evaluating uncertainty in Bayesian models, be aware that Mean Field Variational Inference (MFVI) can overestimate predictive variance, particularly for in-distribution data. If your models rely on MFVI for uncertainty quantification, you should specifically test predictive variance in regions dense with training data. Consider exploring temperature scaling techniques to potentially correct this overestimation and achieve more accurate uncertainty estimates closer to exact posteriors.

Key insights

MFVI, despite underestimating parameter variance, can overestimate predictive variance, especially where training data concentrates.

Principles

Method

The paper analyzes conjugate Bayesian Linear Regression (BLR) to demonstrate MFVI's predictive variance overestimation. It then connects this to the Cold Posterior Effect and validates findings on regression tasks.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.