Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation

2026-06-16 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation" introduces a novel function-space theory within the Neural Tangent Kernel (NTK) regime to explain catastrophic forgetting in continual adaptation. This approach identifies that new-task training induces old-task prediction drift through the cross-task kernel, providing a closed-form predictor for the forgetting vector even before new-task gradient steps. The predictor is exact for frozen-backbone linear-head PEFT-CL models, which are linear in trainable parameters, and serves as a local NTK approximation for nonlinear adapters or full fine-tuning. Crucially, the theory reveals that forgetting concentrates in a small number of old-task NTK eigenmodes, and for frozen linear heads, it establishes a Kronecker scaling rule for the vulnerable rank. These findings clarify existing NTK-overlap theory, explain limitations of parameter-space regularizers, and motivate a new targeted spectral regularizer.

Key takeaway

For Machine Learning Engineers developing continual learning systems, this theory suggests rethinking regularization strategies. You should investigate spectral regularizers that specifically target low-rank output-space interference, rather than solely relying on parameter-space methods which may miss critical forgetting mechanisms. This understanding can guide the design of more robust and efficient adaptation algorithms, improving model stability when incrementally learning new tasks.

Key insights

Catastrophic forgetting is a low-rank function-space phenomenon, predictable via a cross-task kernel in the NTK regime.

Principles

Forgetting manifests as output-space interference.
NTK predicts forgetting before gradient steps.
Forgetting concentrates in low-rank eigenmodes.

Method

A closed-form predictor for the forgetting vector is derived using the cross-task kernel in the NTK regime, quantifying old-task prediction drift induced by new-task training before gradient steps.

In practice

Design targeted spectral regularizers.
Understand limits of parameter-space regularizers.
Inform continual learning algorithm design.

Topics

Catastrophic Forgetting
Continual Learning
Neural Tangent Kernel
Function Space Theory
Spectral Regularization
PEFT-CL

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.