Safe, Scalable, and Accurate Bayes Posterior Sampling for Large-Data Generalized Linear Mixed Models

2026-04-30 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

This paper introduces a novel stochastic mirror Langevin dynamics (SMLD) algorithm designed for safe, scalable, and accurate Bayesian posterior sampling in large-data generalized linear mixed models (GLMMs). Traditional stochastic gradient Langevin dynamics (SGLD) methods, when applied to re-parameterized constrained parameters like covariance matrices in GLMMs, often lead to divergent Markov chains. The SMLD algorithm addresses this by transferring mirror Langevin dynamics to hierarchical GLMs, ensuring ergodic chains for common GLMM likelihoods. The authors provide a rigorous error analysis, demonstrating that SMLD's squared distance to the target posterior decays like O(n^-δ) for step sizes ε=O(n^-(1+δ)). Furthermore, they propose a non-intrusive post-processing step that corrects the posterior variance estimation bias due to subsampling, yielding an asymptotically order-wise correct estimate. Empirical validation includes simulated experiments and a longitudinal study of pain trajectories in breast cancer survivors, highlighting the method's accuracy and computational efficiency compared to conventional MCMC.

Key takeaway

For AI Scientists and Research Scientists working with large-scale Bayesian GLMMs, adopting the SMLD algorithm is crucial for robust and accurate posterior sampling. Traditional SGLD methods risk divergence with constrained parameters, but SMLD provides algorithmic safety and efficiency. Implement the proposed post-processing step to correct for subsampling bias, ensuring your posterior variance estimates are asymptotically correct and reliable for calibrating Bayesian p-values, especially in biomedical or similar longitudinal studies.

Key insights

SMLD offers a safe, scalable, and accurate Bayesian sampling method for large-data GLMMs with constrained parameters.

Principles

Smooth re-parameterization can cause SGMCMC divergence.
Mirror Langevin dynamics ensures ergodic chains for constrained parameters.
Post-processing can correct subsampling-induced variance bias.

Method

The SMLD algorithm uses a mirror map and stochastic gradients with data subsampling. A post-processing step, based on solving a Lyapunov equation, re-scales samples to correct posterior variance estimates.

In practice

Apply SMLD for Bayesian inference in large GLMMs.
Use the proposed post-processing for accurate posterior variance.
Consider R=1,000 MCMC samples for stochastic gradient estimation.

Topics

Generalized Linear Mixed Models
Bayesian Posterior Sampling
Stochastic Mirror Langevin Dynamics
Constrained Parameter Inference
Posterior Variance Correction

Best for: AI Scientist, Research Scientist, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.