Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation
Summary
Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation" introduces a novel function-space theory within the Neural Tangent Kernel (NTK) regime to explain catastrophic forgetting in continual adaptation. This approach identifies that new-task training induces old-task prediction drift through the cross-task kernel, providing a closed-form predictor for the forgetting vector even before new-task gradient steps. The predictor is exact for frozen-backbone linear-head PEFT-CL models, which are linear in trainable parameters, and serves as a local NTK approximation for nonlinear adapters or full fine-tuning. Crucially, the theory reveals that forgetting concentrates in a small number of old-task NTK eigenmodes, and for frozen linear heads, it establishes a Kronecker scaling rule for the vulnerable rank. These findings clarify existing NTK-overlap theory, explain limitations of parameter-space regularizers, and motivate a new targeted spectral regularizer.
Key takeaway
For Machine Learning Engineers developing continual learning systems, this theory suggests rethinking regularization strategies. You should investigate spectral regularizers that specifically target low-rank output-space interference, rather than solely relying on parameter-space methods which may miss critical forgetting mechanisms. This understanding can guide the design of more robust and efficient adaptation algorithms, improving model stability when incrementally learning new tasks.
Key insights
Catastrophic forgetting is a low-rank function-space phenomenon, predictable via a cross-task kernel in the NTK regime.
Principles
- Forgetting manifests as output-space interference.
- NTK predicts forgetting before gradient steps.
- Forgetting concentrates in low-rank eigenmodes.
Method
A closed-form predictor for the forgetting vector is derived using the cross-task kernel in the NTK regime, quantifying old-task prediction drift induced by new-task training before gradient steps.
In practice
- Design targeted spectral regularizers.
- Understand limits of parameter-space regularizers.
- Inform continual learning algorithm design.
Topics
- Catastrophic Forgetting
- Continual Learning
- Neural Tangent Kernel
- Function Space Theory
- Spectral Regularization
- PEFT-CL
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.