Zeroth-Order Optimization at the Edge of Stability

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A new study provides an explicit step size condition for zeroth-order (ZO) optimization methods, specifically those based on the standard two-point estimator. This research reveals that the mean-square linear stability of ZO methods is governed by the entire Hessian spectrum, contrasting sharply with first-order (FO) methods, where stability depends only on the largest Hessian eigenvalue. Recognizing the impracticality of computing the full Hessian spectrum for neural networks, the authors derive tractable stability bounds that rely solely on the largest eigenvalue and the Hessian trace. Empirical findings demonstrate that full-batch ZO methods like ZO-GD, ZO-GDM, and ZO-Adam consistently stabilize near the predicted boundary across various deep learning training tasks, highlighting an implicit regularization effect where large ZO step sizes primarily regularize the Hessian trace.

Key takeaway

For research scientists optimizing large models with zeroth-order methods, understanding that ZO stability is tied to the full Hessian spectrum, not just the top eigenvalue, is critical. You should consider the derived tractable stability bounds, which depend on the largest eigenvalue and Hessian trace, to guide step size selection. This insight suggests that larger step sizes in ZO methods implicitly regularize the Hessian trace, offering a different optimization dynamic compared to first-order approaches.

Key insights

Zeroth-order optimization stability depends on the full Hessian spectrum, unlike first-order methods.

Principles

Method

The study derives tractable stability bounds for ZO methods using only the largest Hessian eigenvalue and its trace, avoiding full spectrum computation.

In practice

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.