Randomized Subspace Nesterov Accelerated Gradient
Summary
This paper introduces Randomized-Subspace Nesterov Accelerated Gradient (RS-NAG) methods for smooth convex and smooth strongly convex optimization, addressing the challenge of integrating Nesterov acceleration with randomized subspace techniques. The authors develop a novel three-sequence formulation tailored to matrix smoothness, which recovers classical Nesterov methods in full-dimensional scenarios. The resulting theory provides accelerated oracle-complexity guarantees, explicitly showing how matrix smoothness and sketch distribution influence complexity. The work also offers a unified framework for comparing different sketch families, such as Haar, coordinate, and Gaussian sketches, and identifies conditions under which randomized-subspace acceleration outperforms full-dimensional Nesterov acceleration in terms of oracle complexity. Numerical experiments on quadratic objectives and $\ell_2$-regularized logistic regression datasets, with a fixed sketch dimension of $r=1$, empirically validate the theoretical predictions, demonstrating strong performance and alignment with dataset-dependent constants.
Key takeaway
For Machine Learning Engineers and AI Scientists working on large-scale, high-dimensional optimization problems, adopting Randomized-Subspace Nesterov Accelerated Gradient (RS-NAG) methods can significantly improve convergence rates. Your teams should consider RS-NAG, especially in communication-limited or forward-mode AD environments, as it offers superior oracle complexity compared to standard Nesterov acceleration. Pay close attention to the problem's matrix smoothness properties and the chosen sketch distribution, as these factors directly influence performance. Empirically, setting the sketch dimension $r=1$ often yields the best results for common sketch types.
Key insights
RS-NAG methods accelerate optimization by using low-dimensional projected gradients, outperforming standard Nesterov acceleration in specific oracle complexity scenarios.
Principles
- Matrix smoothness and sketch distribution directly impact oracle complexity.
- Optimal sketch dimension for Haar, Coordinate, and Gaussian sketches is $r=1$.
- Three-sequence formulations enable robust convergence analysis for sketched acceleration.
Method
RS-NAG uses a three-sequence formulation for sketched descent, auxiliary estimate sequences, and projected gradients $P_k P_k^\top \nabla f(y_k)$ to achieve accelerated convergence in convex and strongly convex settings.
In practice
- Consider RS-NAG for high-dimensional problems with communication or memory constraints.
- Evaluate $Q$-values (e.g., $Q_H, Q_C, Q_G$) to select optimal sketch distribution.
- Set sketch dimension $r=1$ for Haar, Coordinate, and Gaussian sketches for best oracle performance.
Topics
- Randomized Subspace Methods
- Nesterov Acceleration
- Matrix Smoothness
- Oracle Complexity
- Convex Optimization
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by stat.ML updates on arXiv.org.