Self-Distillation as a Performance Recovery Mechanism for LLMs: Counteracting Compression and Catastrophic Forgetting
Summary
A new framework based on Self-Distillation Fine-Tuning (SDFT) has been introduced to counteract performance degradation in Large Language Models (LLMs) caused by catastrophic forgetting during Supervised Fine-Tuning (SFT), quantization, and pruning. This framework, which leverages the model's own historical states as a teacher, effectively restores model capabilities without external dependencies. The research provides a theoretical explanation, positing that LLM generative capability relies on the high-dimensional manifold constructed by its hidden layers. Centered Kernel Alignment (CKA) is used to quantify the alignment between student and teacher activation trajectories, demonstrating a strong correlation between performance recovery and manifold alignment. Experiments on Qwen2.5-3B-Instruct and Qwen2.5-7B-Instruct across Tooluse and Science tasks show SDFT restores performance, with specific gains of +15-22% in task-specific accuracy for quantization recovery and significant restoration of general capabilities.
Key takeaway
For NLP engineers and research scientists dealing with LLM degradation from fine-tuning or compression, adopting the Self-Distillation Fine-Tuning (SDFT) framework offers a robust recovery mechanism. You should consider using SDFT with historical model states as teachers to restore performance and general capabilities, rather than resorting to costly retraining. Furthermore, integrating Centered Kernel Alignment (CKA) into your evaluation pipeline can provide a quantitative metric for diagnosing and monitoring the severity of capability loss and the effectiveness of recovery efforts.
Key insights
Self-distillation recovers LLM performance by realigning the student model's high-dimensional internal representation manifold with the teacher's optimal structure.
Principles
- LLM generative capability relies on hidden layer manifold structure.
- SDFT acts as an "anchor" pulling degraded parameters to a high-performance manifold.
- CKA quantifies manifold alignment, invariant to orthogonal transformations.
Method
The Self-Distillation Recovery Framework uses a model's historical checkpoints as a teacher to guide a degraded student model, optimizing for both capability recovery and task adaptation through a dual-objective process.
In practice
- Apply SDFT to recover LLM performance post-SFT, quantization, or pruning.
- Use CKA as a diagnostic tool for forgetting severity and manifold misalignment.
- For small LLMs, bootstrap ICL via off-policy distillation before SDFT.
Topics
- Self-Distillation Fine-Tuning
- LLM Performance Recovery
- Catastrophic Forgetting
- Model Compression
- High-Dimensional Manifold
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.