RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting
Summary
RAFT, a two-stage framework, addresses performance degradation in domain-specific supervised fine-tuning (SFT) by mitigating supervision-compatibility and trajectory-preservation gaps. The first stage refines data by constructing model-compatible supervision through self-conditioned rewriting, semantic filtering, and answer fusion. The second stage employs Answer-Conditioned On-Policy Distillation, where the original instruction-tuned model provides soft targets on student-generated trajectories, guided by the fused answer. This framework also integrates top-K temperature distillation and EMA-based adaptive loss balancing to stabilize the domain-general trade-off. Evaluated across three instruction-tuned backbones and five domains, RAFT achieved a 23.2% improvement in average domain accuracy compared to standard SFT. Furthermore, it recovered SFT-induced degradation on MS-Bench and IFEval, showing relative improvements of 18.2% and 10.2%, respectively.
Key takeaway
For Machine Learning Engineers fine-tuning instruction-tuned models for specific domains, RAFT offers a robust approach to mitigate forgetting. You should consider its two-stage framework, which refines domain data and preserves original model behavior, to significantly improve in-domain accuracy while retaining general capabilities. Implement data refinement techniques like self-conditioned rewriting and adaptive distillation to achieve better domain-general trade-offs in your SFT pipelines.
Key insights
RAFT improves domain fine-tuning by refining data and preserving original model behavior through adaptive distillation.
Principles
- Domain SFT often degrades general capabilities.
- Supervision-compatibility and trajectory-preservation are key gaps.
- Coupling data refinement with trajectory preservation is effective.
Method
RAFT's two-stage method involves constructing model-compatible supervision via self-conditioned rewriting, semantic filtering, and answer fusion, followed by Answer-Conditioned On-Policy Distillation with soft targets on student-generated trajectories.
In practice
- Use self-conditioned rewriting for domain data.
- Apply answer-conditioned on-policy distillation.
- Implement top-K temperature distillation.
Topics
- RAFT Framework
- Domain Fine-Tuning
- Supervised Fine-Tuning
- Catastrophic Forgetting
- Data Refinement
- Adaptive Distillation
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.