Kernel Renormalization in Bayesian Deep Neural Networks: the Equivalent Wishart Ansatz in the Proportional Regime

2026-05-28 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new effective approximate approach is presented to predict the generalization performance of Bayesian multi-layer perceptrons (MLPs) of fixed depth L on arbitrary high-dimensional data. This method proposes an equivalent Wishart Ansatz to capture dominant stochastic fluctuations of hierarchical empirical kernels in MLPs. It facilitates a large deviation analysis for the partition function in the proportional limit, expressed via a renormalized NNGP kernel. Strong representation learning in this limit is encoded in at most L scalar order parameters, determined self-consistently. The approach extends to convolutional architectures (CNNs), identifying a hierarchical local kernel renormalization mechanism that quantifies complex data-dependent transformations of large-width kernels due to finite-width effects. The theory shows very good agreement with sampling experiments from Bayesian posterior of finite deep neural networks with depths L ~ O(10) and P ~ O(10^3) on classic benchmark datasets, alongside two distinct types of systematic deviations.

Key takeaway

For AI Scientists focused on the theoretical underpinnings of deep learning, this work provides a novel framework to predict generalization performance in Bayesian MLPs and CNNs. You should consider how the equivalent Wishart Ansatz and kernel renormalization mechanism can inform your understanding of finite-width effects and data-dependent transformations in deep networks. This approach offers a path to quantify complex behaviors with a limited number of scalar order parameters.

Key insights

The paper proposes an equivalent Wishart Ansatz and kernel renormalization to predict Bayesian DNN generalization in the proportional limit.

Principles

Capturing stochastic fluctuations is key for deep network analysis.
Proportional limit analysis can simplify deep network understanding.
Kernel renormalization quantifies finite-width effects.

Method

Propose an equivalent Wishart Ansatz for hierarchical empirical kernels, perform large deviation analysis for the partition function, and use a renormalized NNGP kernel.

In practice

Predict generalization of Bayesian MLPs.
Quantify data-dependent CNN kernel transformations.
Analyze finite-width effects in deep networks.

Topics

Bayesian Deep Learning
Kernel Renormalization
Multi-layer Perceptrons
Convolutional Neural Networks
Generalization Performance
Wishart Ansatz

Best for: Research Scientist, AI Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.