A Rod Flow Model for Adam at the Edge of Stability
Summary
This research extends the "rod flow" continuous-time modeling framework to adaptive gradient methods like Adam, which are observed to operate at the "edge of stability" (EoS). The original rod flow, introduced by Regis and Chewi (2026) for gradient descent, models consecutive iterates as an extended one-dimensional object, or "rod," whose length and orientation encode oscillation. This paper generalizes rod flow to momentum methods (heavy ball, Nesterov) by lifting the rod to a joint phase space of parameters and first moment $(w,m)$, and to adaptive methods (RMSProp, Adam, NAdam) by treating the second moment $\nu$ as a smooth auxiliary variable. The derived rod flow ODEs for Adam combine these extensions. Empirical evaluations on MLP, CNN, and ViT architectures trained on CIFAR-10 demonstrate that the Adam rod flow accurately tracks discrete iterates through the EoS regime, outperforming the "stable flow" (naïve continuous-time limit) by several orders of magnitude, and correctly stabilizes at theoretically predicted preconditioned sharpness thresholds.
Key takeaway
For research scientists developing or analyzing optimization algorithms, this extended rod flow model provides a robust continuous-time framework for understanding Adam's behavior at the edge of stability. You should consider using this model to predict and interpret the complex oscillatory dynamics of adaptive optimizers, especially when traditional stable flow models fail. This approach offers a more accurate representation of how optimizers like Adam navigate loss landscapes, potentially guiding the design of more stable and efficient training strategies.
Key insights
Rod flow models accurately capture adaptive optimizer dynamics at the edge of stability by tracking average iterates and oscillation extent.
Principles
- Adaptive optimizers operate at a hyperparameter-dependent EoS threshold.
- Oscillatory dynamics require phase-space modeling for momentum methods.
- Preconditioners adaptively absorb excess sharpness without divergence.
Method
The rod flow method extends to Adam by lifting to phase space $(w,m)$ and treating the second moment $\nu$ as a smooth auxiliary variable, enabling continuous-time modeling of oscillatory dynamics.
In practice
- Use rod flow to analyze optimizer behavior at the edge of stability.
- Employ low-rank representations for $\Sigma$ to manage computational cost.
- Incorporate bias correction for accurate modeling of Adam's early training.
Topics
- Rod Flow
- Edge of Stability
- Adaptive Gradient Methods
- Adam Optimizer
- Continuous-time Models
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.