Kernel Renormalization in Bayesian Deep Neural Networks: the Equivalent Wishart Ansatz in the Proportional Regime
Summary
A new effective approximate approach is presented to predict the generalization performance of Bayesian multi-layer perceptrons (MLPs) of fixed depth L on arbitrary high-dimensional data. This method proposes an equivalent Wishart Ansatz to capture dominant stochastic fluctuations of hierarchical empirical kernels in MLPs. It facilitates a large deviation analysis for the partition function in the proportional limit, expressed via a renormalized NNGP kernel. Strong representation learning in this limit is encoded in at most L scalar order parameters, determined self-consistently. The approach extends to convolutional architectures (CNNs), identifying a hierarchical local kernel renormalization mechanism that quantifies complex data-dependent transformations of large-width kernels due to finite-width effects. The theory shows very good agreement with sampling experiments from Bayesian posterior of finite deep neural networks with depths L ~ O(10) and P ~ O(10^3) on classic benchmark datasets, alongside two distinct types of systematic deviations.
Key takeaway
For AI Scientists focused on the theoretical underpinnings of deep learning, this work provides a novel framework to predict generalization performance in Bayesian MLPs and CNNs. You should consider how the equivalent Wishart Ansatz and kernel renormalization mechanism can inform your understanding of finite-width effects and data-dependent transformations in deep networks. This approach offers a path to quantify complex behaviors with a limited number of scalar order parameters.
Key insights
The paper proposes an equivalent Wishart Ansatz and kernel renormalization to predict Bayesian DNN generalization in the proportional limit.
Principles
- Capturing stochastic fluctuations is key for deep network analysis.
- Proportional limit analysis can simplify deep network understanding.
- Kernel renormalization quantifies finite-width effects.
Method
Propose an equivalent Wishart Ansatz for hierarchical empirical kernels, perform large deviation analysis for the partition function, and use a renormalized NNGP kernel.
In practice
- Predict generalization of Bayesian MLPs.
- Quantify data-dependent CNN kernel transformations.
- Analyze finite-width effects in deep networks.
Topics
- Bayesian Deep Learning
- Kernel Renormalization
- Multi-layer Perceptrons
- Convolutional Neural Networks
- Generalization Performance
- Wishart Ansatz
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.