PriFT: Prior-Support Guided Supervised Fine-Tuning
Summary
PriFT (Prior-support guided Fine-Tuning) is a novel approach addressing supervised fine-tuning's (SFT) generalization limitations, particularly its tendency to overfit by fitting misaligned tokens. Traditional token-reweighting methods often entangle weights with the optimization trajectory, causing rapid divergence from the pretrained model. PriFT resolves this by deriving stable token reweighting signals from a frozen pretrained reference model, estimating "prior support" for each target token. This method consistently improves performance across existing reweighting rules. Two instantiations, PriFT-prob and PriFT-mass, achieve state-of-the-art results among SFT baselines in mathematical reasoning, code generation, and medical question answering, also providing a better initialization for subsequent reinforcement learning training.
Key takeaway
For Machine Learning Engineers adapting large language models or preparing them for reinforcement learning, PriFT offers a superior supervised fine-tuning approach. You should consider implementing PriFT-prob or PriFT-mass to achieve state-of-the-art SFT performance. This method provides a more robust initialization for subsequent RL training, effectively mitigating overfitting and significantly improving generalization across tasks like mathematical reasoning, code generation, and medical question answering.
Key insights
PriFT improves SFT generalization by reweighting tokens based on a frozen pretrained model's "prior support" for stability.
Principles
- SFT's off-policy objective can cause overfitting.
- Stable reweighting signals improve fine-tuning.
- Prior support from pretrained models is key.
Method
PriFT derives token weights from a frozen pretrained reference model to estimate "prior support," ensuring a stable reweighting signal unaffected by the fine-tuning process.
In practice
- Use PriFT for SFT in mathematical reasoning.
- Apply PriFT to enhance code generation.
- Improve medical QA with PriFT initialization.
Topics
- Supervised Fine-Tuning
- Token Reweighting
- Pretrained Models
- Generalization
- Reinforcement Learning
- Mathematical Reasoning
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.