Does the Adam Optimizer Amplify Catastrophic Forgetting?

2026-03-17 · Source: HackerNoon · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, short

Summary

Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), particularly in online and continual learning, and its quantification and contributing factors are poorly understood. This paper provides evidence that the choice of modern gradient-based optimization algorithm significantly impacts catastrophic forgetting, surprisingly showing that classical algorithms like vanilla SGD often experience less forgetting than modern ones such as Adam. Furthermore, the study empirically compares four existing metrics for quantifying catastrophic forgetting, revealing that the degree of forgetting is highly sensitive to the metric used, with different principled metrics leading to dramatically altered conclusions. The authors recommend a more rigorous experimental methodology, suggesting that inter-task forgetting in supervised learning must be measured with both retention and relearning metrics concurrently, and intra-task forgetting in reinforcement learning with pairwise interference. This work highlights the critical need for careful optimizer selection and a standardized, multi-metric approach to accurately assess catastrophic forgetting.

Key takeaway

Vanilla SGD often causes less catastrophic forgetting (CF) than modern optimizers like Adam, a critical factor for online learning systems. The measured degree of CF is highly sensitive to the chosen metric, with different principled metrics (e.g., retention, relearning, pairwise interference) yielding dramatically varied conclusions. For robust assessment, supervised learning requires concurrent retention and relearning metrics, while reinforcement learning benefits from pairwise interference, necessitating a more rigorous experimental methodology.

Topics

Catastrophic Forgetting
Gradient-based Optimization
Continual Learning
Reinforcement Learning
Forgetting Metrics

Code references

dylanashley/catastrophic-forgetting

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by HackerNoon.