Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning
Summary
A new diagnostic framework, leveraging Sparse Autoencoders (SAEs), has been developed to analyze catastrophic forgetting in continual learning at a fine-grained, concept-level. This framework, proposed by Filus et al., moves beyond traditional task-level performance metrics to investigate how task-specific information evolves within a vision model's representation space. By defining a task-anchored latent feature space where individual SAE latents act as concept proxies, the researchers decompose forgetting into apparent concept deletion, recoverability, and decodability. Their findings, based on experiments with Resnet18 on 2seq-CIFAR10, 2seq-tiny-ImageNet, and 10seq-tiny-ImageNet datasets, indicate that a significant portion of seemingly lost concept-level information can be recovered under a linearity assumption. This suggests that forgetting often stems from changes in representational accessibility rather than complete information erasure, although concept decodability degrades as more tasks are introduced.
Key takeaway
For research scientists developing continual learning models, understanding the nature of forgetting at a concept level is crucial. You should integrate diagnostic frameworks using Sparse Autoencoders to distinguish between true information loss and mere representational drift. Prioritize strategies like Learning without Forgetting (LwF) that maintain linear recoverability of fine-grained concepts, as this preserves model interpretability and stability, even if it means a slight trade-off in overall per-task performance compared to methods like DER++.
Key insights
Catastrophic forgetting in continual learning is often representational misalignment, not complete information erasure.
Principles
- Forgetting can be decomposed into deletion, recoverability, and decodability.
- Linear translation can restore much of the lost concept-level information.
- Continual learning strategies differ in fine-grained information preservation.
Method
Train Sparse Autoencoders (SAEs) on task data to define a fixed, task-anchored latent space. Analyze concept activation dynamics, linear recoverability via translation, and concept-level decodability using linear probes.
In practice
- Use SAEs to diagnose fine-grained forgetting in vision models.
- Employ linear translation to assess concept recoverability.
- Evaluate concept decodability with linear classifiers.
Topics
- Continual Learning
- Catastrophic Forgetting
- Sparse Autoencoders
- Concept-Level Analysis
- Representational Drift
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.