MIT's new fine-tuning method lets LLMs learn new skills without losing old ones
Summary
Researchers from MIT, the Improbable AI Lab, and ETH Zurich have developed a new fine-tuning method called Self-Distillation Fine-Tuning (SDFT) that enables large language models (LLMs) to acquire new skills and knowledge without experiencing catastrophic forgetting of previous capabilities. Traditional supervised fine-tuning (SFT) often leads to performance regression on older tasks, while reinforcement learning (RL) struggles with defining reward functions for complex enterprise scenarios and injecting entirely new information. SDFT addresses these limitations by leveraging the LLM's own in-context learning abilities to create an on-policy learning loop, where a frozen "teacher" model provides feedback to a "student" version. Experiments with the Qwen 2.5 model demonstrated SDFT's superior performance in science Q&A (70.2% accuracy vs. 66.2% for SFT), its ability to preserve original knowledge, and its success in sequential learning across tasks like science, tool use, and medical reasoning, offering a path to consolidate multiple skills into a single model.
Key takeaway
For AI Scientists and NLP Engineers developing adaptive enterprise LLMs, SDFT offers a critical solution to catastrophic forgetting. You can now fine-tune models to acquire new, proprietary knowledge and skills sequentially without degrading existing capabilities, potentially reducing the need for "model zoos" and lowering inference costs. Consider integrating SDFT, available on GitHub and in progress for Hugging Face's TRL library, especially for models with strong in-context learning (e.g., Qwen 3 4B+ parameter models) where defining RL reward functions is impractical.
Key insights
SDFT enables LLMs to learn new skills continually without forgetting old ones, using self-distillation and in-context learning.
Principles
- On-policy learning prevents catastrophic forgetting.
- In-context learning can create self-supervision.
- Single models can accumulate diverse skills.
Method
SDFT uses a frozen "teacher" LLM with expert demonstrations to guide a "student" LLM, creating an on-policy learning loop via distillation and in-context learning, without needing explicit reward functions.
In practice
- Consolidate multiple LLM skills into one model.
- Reduce inference costs by hosting fewer models.
- Apply to domains lacking clear reward functions.
Topics
- Self-Distillation Fine-Tuning
- Continual Learning
- Large Language Models
- Catastrophic Forgetting
- In-Context Learning
Best for: AI Scientist, Research Scientist, NLP Engineer, Machine Learning Engineer, AI Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by VentureBeat.