When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

2026-06-08 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A new diagnostic theory and benchmark, "When Do Local Score Models Extrapolate Across Size?", addresses the challenge of size transfer in scientific generative modeling. This work reveals that while translation-invariant architectures allow evaluation on larger systems, stable extrapolation is primarily determined by the quasi-locality of the Gaussian-smoothed score, not just architectural locality. The theory, formalized with a size-uniform comparison theorem for local marginals under reverse diffusion, explains how distant perturbations can impact local score components via posterior covariance. It posits that a local model requires its receptive field to encompass the smoothed score's response range for success. To validate this, the paper introduces Finite-Depth Local Flow (FDLF), a white-box benchmark providing exact scores, densities, and controllable response ranges. Empirical results confirm that stable extrapolation occurs when spatial mixing maintains the smoothed score's quasi-locality relative to the receptive field, whereas weakened spatial mixing leads to extrapolation failure.

Key takeaway

For AI Scientists and Research Scientists developing generative models for scientific applications, understanding size transfer is critical. You should prioritize designing models where the receptive field adequately covers the Gaussian-smoothed score's response range, especially when spatial mixing is a factor. Your model's architectural locality alone is insufficient; stable extrapolation hinges on the smoothed score's quasi-locality. Consider using the Finite-Depth Local Flow (FDLF) benchmark to diagnose and validate your model's size extrapolation capabilities.

Key insights

Stable size extrapolation in generative models depends on the Gaussian-smoothed score's quasi-locality relative to the model's receptive field.

Principles

Architectural locality alone does not guarantee stable size extrapolation.
Smoothed score quasi-locality governs stable size extrapolation.
Receptive field must cover the smoothed score's response range.

Method

The paper formalizes a size-uniform comparison theorem for local marginals under reverse diffusion. It introduces Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges.

In practice

Evaluate model receptive field against smoothed score response.
Use FDLF benchmark for diagnostic testing of size transfer.
Ensure sufficient spatial mixing for stable extrapolation.

Topics

Generative Modeling
Size Transfer
Score-based Models
Gaussian-smoothed Score
Receptive Fields
Finite-Depth Local Flow
Spatial Mixing

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.