Preconditioned inexact stochastic ADMM for deep models

2026-02-20 · Source: Nature Machine Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A new optimization algorithm, Preconditioned Inexact Stochastic Alternating Direction Method of Multipliers (PISA), has been developed to address limitations of stochastic gradient descent (SGD)-based methods in deep learning, particularly slow convergence and challenges with data heterogeneity in distributed settings. PISA offers strong theoretical convergence guarantees, requiring only Lipschitz continuity of the gradient on a bounded region, a weaker assumption than those typically needed by stochastic algorithms. The algorithm's architecture supports scalable parallel computing and incorporates various preconditioning techniques, including second-order information, second-moment, and orthogonalized momentum via Newton–Schulz iterations. Two computationally efficient variants, SISA (Second-moment-based Inexact SADMM) and NSISA (Newton–Schulz-based Inexact SADMM), were derived. Extensive experiments across diverse deep models, including vision models, large language models (LLMs) like GPT2-Nano, GPT2-Medium, and GPT2-XL, reinforcement learning models, generative adversarial networks (GANs), and recurrent neural networks, demonstrated that SISA and NSISA achieve superior numerical performance compared to various state-of-the-art optimizers, especially on heterogeneous datasets like MNIST and CIFAR-10.

Key takeaway

For AI engineers and research scientists working on distributed deep learning with heterogeneous datasets, PISA and its variants (SISA, NSISA) offer a robust alternative to traditional SGD-based optimizers. Your models can achieve faster convergence and higher accuracy, particularly in scenarios with non-IID data, by integrating these ADMM-based methods. Consider experimenting with SISA for vision tasks and NSISA for LLM fine-tuning to improve training efficiency and model performance.

Key insights

PISA is a new ADMM-based optimizer for deep learning, offering robust convergence and superior performance on heterogeneous data.

Principles

Relaxing convergence assumptions enhances optimizer applicability.
Preconditioning improves stochastic optimization performance.
Data heterogeneity is a critical challenge for distributed learning.

Method

PISA employs a preconditioned inexact stochastic ADMM framework, solving subproblems inexactly with stochastic gradients and incorporating preconditioning matrices to integrate second-moment or orthogonalized momentum information, enabling parallel computation.

In practice

Use SISA for vision models and GANs for faster convergence.
Apply NSISA for fine-tuning large language models like GPT2.
Consider PISA variants for distributed learning with non-IID data.

Topics

PISA Algorithm
Stochastic ADMM
Deep Learning Optimization
Data Heterogeneity
Convergence Theory

Code references

Tracy-Wang7/PISA

Best for: Research Scientist, AI Engineer, Machine Learning Engineer, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Nature Machine Intelligence.