Preserving Plasticity in Continual Learning via Dynamical Isometry

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Continual training of deep neural networks often results in a progressive loss of plasticity, which limits further learning. This research relates plasticity to the empirical Neural Tangent Kernel and identifies dynamical isometry, where layer-wise Jacobian singular values remain near one, as crucial for preserving plasticity. The work revisits almost-everywhere isometric networks, demonstrating that near-dynamical isometry is compatible with expressive nonlinear representations. For general architectures, an efficient isometry-promoting regularization scheme is proposed, which can reactivate dormant ReLU units. Furthermore, the paper introduces AdamO, an Adam-style adaptive optimizer that decouples isometry regularization from gradient updates, similar to AdamW. Prior plasticity-preserving approaches are reinterpreted through this lens, showing they address only a partial measure of isometry. The methods consistently match or outperform existing approaches across supervised and reinforcement-learning continual-learning benchmarks designed to induce plasticity loss.

Key takeaway

For Machine Learning Engineers developing continual learning systems, addressing plasticity loss is critical for long-term model performance. You should consider integrating the proposed AdamO optimizer or an isometry-promoting regularization scheme into your training pipelines. This approach can help maintain network plasticity, reactivate dormant ReLU units, and consistently improve learning capacity across diverse benchmarks, outperforming existing methods.

Key insights

Dynamical isometry is a key mechanism for preserving plasticity in continual learning.

Principles

Method

An efficient isometry-promoting regularization scheme is proposed. AdamO, an Adam-style optimizer, decouples isometry regularization from gradient updates, analogous to AdamW.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.