Does the Adam Optimizer Amplify Catastrophic Forgetting?
Summary
Catastrophic forgetting remains a severe hindrance to the broad application of artificial neural networks (ANNs), particularly in online and continual learning, and its quantification and contributing factors are poorly understood. This paper provides evidence that the choice of modern gradient-based optimization algorithm significantly impacts catastrophic forgetting, surprisingly showing that classical algorithms like vanilla SGD often experience less forgetting than modern ones such as Adam. Furthermore, the study empirically compares four existing metrics for quantifying catastrophic forgetting, revealing that the degree of forgetting is highly sensitive to the metric used, with different principled metrics leading to dramatically altered conclusions. The authors recommend a more rigorous experimental methodology, suggesting that inter-task forgetting in supervised learning must be measured with both retention and relearning metrics concurrently, and intra-task forgetting in reinforcement learning with pairwise interference. This work highlights the critical need for careful optimizer selection and a standardized, multi-metric approach to accurately assess catastrophic forgetting.
Key takeaway
Vanilla SGD often causes less catastrophic forgetting (CF) than modern optimizers like Adam, a critical factor for online learning systems. The measured degree of CF is highly sensitive to the chosen metric, with different principled metrics (e.g., retention, relearning, pairwise interference) yielding dramatically varied conclusions. For robust assessment, supervised learning requires concurrent retention and relearning metrics, while reinforcement learning benefits from pairwise interference, necessitating a more rigorous experimental methodology.
Topics
- Catastrophic Forgetting
- Gradient-based Optimization
- Continual Learning
- Reinforcement Learning
- Forgetting Metrics
Code references
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by HackerNoon.