Finite Neural Networks as Mixtures of Gaussian Processes: From Provable Error Bounds to Prior Selection
Summary
A new algorithmic framework approximates finite neural networks (NNs) with a mixture of Gaussian processes (GPs), providing provable bounds on the approximation error. This work addresses the limitation that NNs and GPs are equivalent only in the infinite width/depth limit, offering a method for finite NNs with non-i.i.d. parameters. The approach quantifies closeness using the Wasserstein distance and iteratively approximates each NN layer's output distribution as a GP mixture. Crucially, for any NN and $\epsilon >0$, the framework returns a GP mixture that is $\epsilon$-close to the NN at a finite set of input points. The differentiability of the error bound also allows tuning NN parameters to mimic a given GP's functional behavior, useful for prior selection in Bayesian inference. Empirical investigations on regression and classification problems with various NN architectures demonstrate the method's effectiveness.
Key takeaway
For research scientists developing or analyzing neural networks, this framework offers a novel way to formally quantify NN uncertainty and align NN behavior with Gaussian processes. You should explore integrating this method to establish provable error bounds for finite NNs, particularly when rigorous uncertainty quantification or specific prior matching is critical for your models.
Key insights
Finite neural networks can be approximated by Gaussian process mixtures with provable error bounds.
Principles
- NNs and GPs are equivalent only in the infinite limit.
- Wasserstein distance quantifies probabilistic model closeness.
Method
The method iteratively approximates each NN layer's output distribution as a mixture of Gaussian processes, using optimal transport and Wasserstein distance to bound approximation error.
In practice
- Quantify NN uncertainty formally.
- Tune NN parameters for Bayesian prior selection.
Topics
- Finite Neural Networks
- Gaussian Processes
- Mixture Models
- Wasserstein Distance
- Optimal Transport
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by JMLR.