Scalable Mean-Field Variational Inference via Preconditioned Primal-Dual Optimization
Summary
This work introduces Primal-Dual Variational Inference (PD-VI) and its block-preconditioned variant, P2D-VI, for large-scale mean-field variational inference (MFVI) in mini-batch settings. The authors reformulate MFVI as a constrained finite-sum problem, enabling a novel primal-dual algorithm that jointly updates global and local variational parameters. P2D-VI further enhances efficiency and numerical robustness by adapting updates to heterogeneous loss geometries across parameter blocks using block-preconditioning. The methods establish convergence guarantees with constant step sizes, achieving $\mathcal{O}(1/T)$ convergence to a stationary point in general settings and linear convergence under strong convexity, without relying on conjugacy assumptions or explicit bounded-variance conditions. Numerical experiments on synthetic Gaussian mixture models and a real-world large-scale spatial transcriptomics dataset (MOSTA, 150,000 locations, 20,000 genes) demonstrate that PD-VI and P2D-VI consistently outperform existing stochastic variational inference approaches in convergence speed and solution quality, particularly in spatial domain detection tasks.
Key takeaway
Research Scientists working on large-scale Bayesian inference problems, especially those involving non-conjugate posteriors or high-dimensional latent variables like in spatial transcriptomics, should consider adopting the P2D-VI framework. Its demonstrated superior convergence speed and solution quality, even with constant step sizes and non-i.i.d. mini-batches, offers a significant advantage over traditional SVI methods, potentially accelerating research and improving model accuracy in complex biological or statistical applications.
Key insights
Primal-dual optimization with preconditioning offers scalable, robust, and faster convergence for large-scale mean-field variational inference.
Principles
- Reformulate MFVI as a constrained finite-sum problem.
- Adapt updates to parameter heterogeneity via block-preconditioning.
- Constant step sizes can ensure stable optimization.
Method
PD-VI and P2D-VI use a mini-batch primal-dual algorithm based on an augmented Lagrangian formulation, jointly updating global and local variational parameters. P2D-VI adds block-preconditioning to rescale updates for different parameter blocks.
In practice
- Apply PD-VI/P2D-VI for large-scale Bayesian inference.
- Use block-preconditioning for models with heterogeneous parameter scales.
- Consider for spatial transcriptomics analysis and domain detection.
Topics
- Mean-Field Variational Inference
- Primal-Dual Optimization
- Preconditioning Algorithms
- Stochastic Variational Inference
- Spatial Transcriptomics
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.