Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection

· Source: JMLR · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new algorithmic framework approximates finite neural networks (NNs) with a mixture of Gaussian processes (GPs), providing provable bounds on the approximation error. This work addresses the limitation that NNs and GPs are equivalent only in the infinite width/depth limit, offering a method for finite NNs with non-i.i.d. parameters. The approach quantifies closeness using the Wasserstein distance and iteratively approximates each NN layer's output distribution as a GP mixture. Crucially, for any NN and $\epsilon >0$, the framework returns a GP mixture that is $\epsilon$-close to the NN at a finite set of input points. The differentiability of the error bound also allows tuning NN parameters to mimic a given GP's functional behavior, useful for prior selection in Bayesian inference. Empirical investigations on regression and classification problems with various NN architectures demonstrate the method's effectiveness.

Key takeaway

For research scientists developing or analyzing neural networks, this framework offers a novel way to formally quantify NN uncertainty and align NN behavior with Gaussian processes. You should explore integrating this method to establish provable error bounds for finite NNs, particularly when rigorous uncertainty quantification or specific prior matching is critical for your models.

Key insights

Finite neural networks can be approximated by Gaussian process mixtures with provable error bounds.

Principles

Method

The method iteratively approximates each NN layer's output distribution as a mixture of Gaussian processes, using optimal transport and Wasserstein distance to bound approximation error.

In practice

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by JMLR.