RAFT: Data Refinement and Adaptive Distillation for Domain Fine-Tuning with Alleviated Forgetting

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

RAFT, a two-stage framework, addresses performance degradation in domain-specific supervised fine-tuning (SFT) by mitigating supervision-compatibility and trajectory-preservation gaps. The first stage refines data by constructing model-compatible supervision through self-conditioned rewriting, semantic filtering, and answer fusion. The second stage employs Answer-Conditioned On-Policy Distillation, where the original instruction-tuned model provides soft targets on student-generated trajectories, guided by the fused answer. This framework also integrates top-K temperature distillation and EMA-based adaptive loss balancing to stabilize the domain-general trade-off. Evaluated across three instruction-tuned backbones and five domains, RAFT achieved a 23.2% improvement in average domain accuracy compared to standard SFT. Furthermore, it recovered SFT-induced degradation on MS-Bench and IFEval, showing relative improvements of 18.2% and 10.2%, respectively.

Key takeaway

For Machine Learning Engineers fine-tuning instruction-tuned models for specific domains, RAFT offers a robust approach to mitigate forgetting. You should consider its two-stage framework, which refines domain data and preserves original model behavior, to significantly improve in-domain accuracy while retaining general capabilities. Implement data refinement techniques like self-conditioned rewriting and adaptive distillation to achieve better domain-general trade-offs in your SFT pipelines.

Key insights

RAFT improves domain fine-tuning by refining data and preserving original model behavior through adaptive distillation.

Principles

Method

RAFT's two-stage method involves constructing model-compatible supervision via self-conditioned rewriting, semantic filtering, and answer fusion, followed by Answer-Conditioned On-Policy Distillation with soft targets on student-generated trajectories.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.