Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

2025-12-31 · Source: JMLR · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new algorithmic framework approximates finite neural networks (NNs) with a mixture of Gaussian processes (GPs), providing provable bounds on the approximation error. This work addresses the limitation that NNs and GPs are equivalent only in the infinite width/depth limit, offering a method for finite NNs with non-i.i.d. parameters. The approach quantifies closeness using the Wasserstein distance and iteratively approximates each NN layer's output distribution as a GP mixture. Crucially, for any NN and $\epsilon >0$, the framework returns a GP mixture that is $\epsilon$-close to the NN at a finite set of input points. The differentiability of the error bound also allows tuning NN parameters to mimic a given GP's functional behavior, useful for prior selection in Bayesian inference. Empirical investigations on regression and classification problems with various NN architectures demonstrate the method's effectiveness.

Key takeaway

For research scientists developing or analyzing neural networks, this framework offers a novel way to formally quantify NN uncertainty and align NN behavior with Gaussian processes. You should explore integrating this method to establish provable error bounds for finite NNs, particularly when rigorous uncertainty quantification or specific prior matching is critical for your models.

Key insights

Finite neural networks can be approximated by Gaussian process mixtures with provable error bounds.

Principles

NNs and GPs are equivalent only in the infinite limit.
Wasserstein distance quantifies probabilistic model closeness.

Method

The method iteratively approximates each NN layer's output distribution as a mixture of Gaussian processes, using optimal transport and Wasserstein distance to bound approximation error.

In practice

Quantify NN uncertainty formally.
Tune NN parameters for Bayesian prior selection.

Topics

Finite Neural Networks
Gaussian Processes
Mixture Models
Wasserstein Distance
Optimal Transport

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by JMLR.