Layer-wise Geometric Approximation Rates for Deep Networks

2026-06-24 · Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

This paper introduces a quantitative framework for deep neural networks, clarifying the approximation quality of intermediate layers. It designs a single shared mixed-activation architecture with a fixed width of 2dN+d+2, capable of arbitrary depth. Each intermediate readout, denoted as Φ_ℓ, functions as an approximant to the target function f ∈ L^p([0,1]^d). The approximation error for Φ_ℓ is rigorously controlled by (2d+1) times the L^p modulus of continuity at the geometric scale N^-ℓ. For 1-Lipschitz functions, this error simplifies to a geometric rate of (2d+1)N^-ℓ. Inspired by multigrade deep learning (MGDL), this nested architecture allows depth to serve as a progressive refinement mechanism, enabling adaptive depth construction where new layers refine residual information without redesigning or retraining preceding network parts. The model's complexity grows logarithmically in ε^-1 and polynomially in d.

Key takeaway

For AI Scientists and Research Scientists designing deep learning architectures for high-accuracy function approximation, you should consider frameworks that provide explicit layer-wise approximation guarantees. This paper's mixed-activation multigrade deep learning (MGDL) approach offers a method for adaptive depth refinement, where new layers progressively reduce residual error at finer scales without redesigning or retraining prior layers. This provides predictable error bounds and logarithmic complexity in inverse accuracy, particularly beneficial for regression or PDE-related problems requiring multiscale resolution.

Key insights

Deep network depth can quantitatively refine function approximation layer-wise, enabling adaptive, nested architectures.

Principles

Depth acts as a resolution parameter, not just model size.
Nested architectures allow adaptive depth refinement without full retraining.
Mixed activations (sine/ReLU) effectively combine localization and oscillation encoding.

Method

A mixed-activation network uses ReLU channels for geometric localization and sine channels for cell-wise value realization, recursively constructing refinement modules to approximate residuals at progressively finer scales.

In practice

Design networks for progressive, layer-wise error reduction.
Consider mixed sine/ReLU activations for multiscale function approximation.
Implement adaptive depth by appending new correction terms.

Topics

Deep Neural Networks
Approximation Theory
Multigrade Deep Learning
Mixed Activations
Layer-wise Approximation
Function Approximation

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.