Gradient descent at the Edge of Stability: free energy model and kinetic description of the two-layer network

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A new continuous-time effective model is proposed to analyze gradient descent dynamics in the Edge of Stability (EoS) regime, where large learning rates induce persistent loss and sharpness oscillations. This model tracks the evolution of the average trajectory (θ) coupled with the time-averaged covariance (Σ) of its fast oscillations. The analysis introduces an "effective free energy" F(θ,Σ), combining the original risk functional with a curvature-related "entropic" term, as the natural quantity to monitor. The model accurately captures oscillation envelopes and explains sharpness increases, even when dynamics evolve on similar timescales as averaged weights. For wide two-layer neural networks, a mean-field limit yields a novel kinetic equation describing joint weight and fluctuation distributions. Numerical evidence on matrix factorization (d=2, L=3, η=0.077) and CIFAR-10 (n=500 images, 2-layer CNN/MLP, η=0.02) validates the model's accuracy and the predictive power of the effective free energy.

Key takeaway

For machine learning engineers and research scientists optimizing deep networks in the Edge of Stability regime, you should consider monitoring the proposed effective free energy F(θ,Σ) instead of solely the loss E(θ). This new continuous-time model offers a more accurate understanding of optimization dynamics, particularly for tracking oscillation envelopes and explaining sharpness increases. Implementing the coupled ODEs for θ and Σ can provide better predictive power for training behavior and potentially guide hyperparameter tuning towards solutions with improved generalization properties.

Key insights

A new model tracks average trajectory and oscillation covariance, revealing an effective free energy for Edge of Stability dynamics.

Principles

Effective free energy F(θ,Σ) is the natural quantity to monitor in unstable optimization regimes.
Edge of Stability's implicit bias drives optimizers towards flatter minima, improving generalization.
Hessian's top eigenvectors exhibit local stability, informing loss landscape geometry at low energies.

Method

The model uses an ansatz θ̃_k = θ_k + √η δθ_k to derive coupled continuous-time ODEs for the average trajectory (θ) and the covariance of oscillations (Σ).

In practice

Monitor F(θ,Σ) instead of E(θ) for better insight into EoS optimization.
Initialize Σ by running relaxation steps, then sampling centered gradient steps.
For high-dimensional problems, track top Hessian eigenvectors instead of the full Σ matrix.

Topics

Gradient Descent
Edge of Stability
Neural Network Optimization
Effective Free Energy
Kinetic Equations
Mean-Field Theory

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.