Lost or Hidden? A Concept-Level Forgetting in Supervised Continual Learning

2026-05-19 · Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

A new diagnostic framework, leveraging Sparse Autoencoders (SAEs), has been developed to analyze catastrophic forgetting in continual learning at a fine-grained, concept-level. This framework, proposed by Filus et al., moves beyond traditional task-level performance metrics to investigate how task-specific information evolves within a vision model's representation space. By defining a task-anchored latent feature space where individual SAE latents act as concept proxies, the researchers decompose forgetting into apparent concept deletion, recoverability, and decodability. Their findings, based on experiments with Resnet18 on 2seq-CIFAR10, 2seq-tiny-ImageNet, and 10seq-tiny-ImageNet datasets, indicate that a significant portion of seemingly lost concept-level information can be recovered under a linearity assumption. This suggests that forgetting often stems from changes in representational accessibility rather than complete information erasure, although concept decodability degrades as more tasks are introduced.

Key takeaway

For research scientists developing continual learning models, understanding the nature of forgetting at a concept level is crucial. You should integrate diagnostic frameworks using Sparse Autoencoders to distinguish between true information loss and mere representational drift. Prioritize strategies like Learning without Forgetting (LwF) that maintain linear recoverability of fine-grained concepts, as this preserves model interpretability and stability, even if it means a slight trade-off in overall per-task performance compared to methods like DER++.

Key insights

Catastrophic forgetting in continual learning is often representational misalignment, not complete information erasure.

Principles

Forgetting can be decomposed into deletion, recoverability, and decodability.
Linear translation can restore much of the lost concept-level information.
Continual learning strategies differ in fine-grained information preservation.

Method

Train Sparse Autoencoders (SAEs) on task data to define a fixed, task-anchored latent space. Analyze concept activation dynamics, linear recoverability via translation, and concept-level decodability using linear probes.

In practice

Use SAEs to diagnose fine-grained forgetting in vision models.
Employ linear translation to assess concept recoverability.
Evaluate concept decodability with linear classifiers.

Topics

Continual Learning
Catastrophic Forgetting
Sparse Autoencoders
Concept-Level Analysis
Representational Drift

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.