Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting

2026-04-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A new framework based on Self-Distillation Fine-Tuning (SDFT) has been introduced to counteract performance degradation in Large Language Models (LLMs) caused by catastrophic forgetting during Supervised Fine-Tuning (SFT), quantization, and pruning. This framework, which leverages the model's own historical states as a teacher, effectively restores model capabilities without external dependencies. The research provides a theoretical explanation, positing that LLM generative capability relies on the high-dimensional manifold constructed by its hidden layers. Centered Kernel Alignment (CKA) is used to quantify the alignment between student and teacher activation trajectories, demonstrating a strong correlation between performance recovery and manifold alignment. Experiments on Qwen2.5-3B-Instruct and Qwen2.5-7B-Instruct across Tooluse and Science tasks show SDFT restores performance, with specific gains of +15-22% in task-specific accuracy for quantization recovery and significant restoration of general capabilities.

Key takeaway

For NLP engineers and research scientists dealing with LLM degradation from fine-tuning or compression, adopting the Self-Distillation Fine-Tuning (SDFT) framework offers a robust recovery mechanism. You should consider using SDFT with historical model states as teachers to restore performance and general capabilities, rather than resorting to costly retraining. Furthermore, integrating Centered Kernel Alignment (CKA) into your evaluation pipeline can provide a quantitative metric for diagnosing and monitoring the severity of capability loss and the effectiveness of recovery efforts.

Key insights

Self-distillation recovers LLM performance by realigning the student model's high-dimensional internal representation manifold with the teacher's optimal structure.

Principles

LLM generative capability relies on hidden layer manifold structure.
SDFT acts as an "anchor" pulling degraded parameters to a high-performance manifold.
CKA quantifies manifold alignment, invariant to orthogonal transformations.

Method

The Self-Distillation Recovery Framework uses a model's historical checkpoints as a teacher to guide a degraded student model, optimizing for both capability recovery and task adaptation through a dual-objective process.

In practice

Apply SDFT to recover LLM performance post-SFT, quantization, or pruning.
Use CKA as a diagnostic tool for forgetting severity and manifold misalignment.
For small LLMs, bootstrap ICL via off-policy distillation before SDFT.

Topics

Self-Distillation Fine-Tuning
LLM Performance Recovery
Catastrophic Forgetting
Model Compression
High-Dimensional Manifold

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.