Kernel Renormalization in Bayesian Deep Neural Networks: the Equivalent Wishart Ansatz in the Proportional Regime

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new effective approximate approach is presented to predict the generalization performance of Bayesian multi-layer perceptrons (MLPs) of fixed depth L on arbitrary high-dimensional data. This method proposes an equivalent Wishart Ansatz to capture dominant stochastic fluctuations of hierarchical empirical kernels in MLPs. It facilitates a large deviation analysis for the partition function in the proportional limit, expressed via a renormalized NNGP kernel. Strong representation learning in this limit is encoded in at most L scalar order parameters, determined self-consistently. The approach extends to convolutional architectures (CNNs), identifying a hierarchical local kernel renormalization mechanism that quantifies complex data-dependent transformations of large-width kernels due to finite-width effects. The theory shows very good agreement with sampling experiments from Bayesian posterior of finite deep neural networks with depths L ~ O(10) and P ~ O(10^3) on classic benchmark datasets, alongside two distinct types of systematic deviations.

Key takeaway

For AI Scientists focused on the theoretical underpinnings of deep learning, this work provides a novel framework to predict generalization performance in Bayesian MLPs and CNNs. You should consider how the equivalent Wishart Ansatz and kernel renormalization mechanism can inform your understanding of finite-width effects and data-dependent transformations in deep networks. This approach offers a path to quantify complex behaviors with a limited number of scalar order parameters.

Key insights

The paper proposes an equivalent Wishart Ansatz and kernel renormalization to predict Bayesian DNN generalization in the proportional limit.

Principles

Method

Propose an equivalent Wishart Ansatz for hierarchical empirical kernels, perform large deviation analysis for the partition function, and use a renormalized NNGP kernel.

In practice

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.