Theory of learning of high-dimensional controlled non-linear dynamical systems (I): models and methods

· Source: stat.ML updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences, Computational Neuroscience · Depth: Expert, extended

Summary

Neural ordinary differential equations (Neural ODEs) are analyzed within a theoretical framework that addresses their dual dynamical nature: inference dynamics and training dynamics. This work introduces a class of solvable models for high-dimensional controlled non-linear dynamical systems, trained via online stochastic gradient descent (SGD). The authors apply dynamical mean field theory (DMFT) to solve the training dynamics in the high-dimensional limit, deriving learning curves and comparing results with numerical simulations. The framework is presented as a unifying approach for understanding various settings, including multi-layer neural networks (e.g., ResNets), autoregressive models, and generative models, offering precise characterization of feature learning and parameter optimization.

Key takeaway

For AI Scientists and Research Scientists developing or analyzing high-dimensional neural networks, this work provides a robust theoretical framework. You should consider applying Dynamical Mean Field Theory (DMFT) to precisely characterize the coupled inference and training dynamics of Neural ODEs, especially for architectures like ResNets or autoregressive models. This approach offers a path to derive exact learning curves and predict model alignment, moving beyond empirical observations to theoretically grounded performance understanding.

Key insights

Dynamical Mean Field Theory (DMFT) can exactly solve coupled inference and training dynamics in high-dimensional Neural ODEs.

Principles

Method

The method involves a Lagrangian formulation for online SGD, deriving Euler-Lagrange equations, and solving the training dynamics via Dynamical Mean Field Theory (DMFT) using path integral representation to obtain self-consistent stochastic processes and learning curves.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.