InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting
Summary
InfoSFT is a novel weighting scheme designed to improve supervised fine-tuning (SFT) of large language models (LLMs) by focusing learning signals on maximally informative tokens. Standard SFT often overfits to low-likelihood samples, leading to policy shifts and degradation of prior capabilities. While existing methods filter or down-weight such data, they risk suppressing novel behaviors. InfoSFT addresses this by concentrating training updates on medium-confidence tokens, which are neither overly familiar nor too unlikely to cause instability. This method requires only a one-line modification to the standard token-wise loss and has shown improved generalization over vanilla SFT and likelihood-weighted baselines across math, code, and chain-of-thought tasks, while also better preserving pre-existing model capabilities.
Key takeaway
For AI Engineers and Research Scientists developing or fine-tuning LLMs, integrating InfoSFT into your SFT pipeline can significantly improve model generalization and stability. By focusing learning on optimally informative tokens, you can achieve better performance across tasks like math, code, and chain-of-thought, while simultaneously preserving the model's pre-existing capabilities more effectively than with standard SFT or likelihood-weighted approaches. Consider this one-line modification to enhance your LLM training outcomes.
Key insights
InfoSFT improves LLM fine-tuning by weighting tokens based on informativeness, balancing novelty and stability.
Principles
- Uniform SFT fitting can lead to overfitting.
- Focus learning on medium-confidence tokens.
- Preserve prior capabilities during fine-tuning.
Method
InfoSFT modifies the standard token-wise SFT loss with a principled weighting scheme that prioritizes maximally informative, medium-confidence tokens to enhance generalization and stability.
In practice
- Apply InfoSFT for improved LLM generalization.
- Use InfoSFT to mitigate SFT policy shifts.
- Enhance fine-tuning across diverse tasks.
Topics
- Supervised Fine-Tuning
- Token Weighting
- InfoSFT
- Generalization Improvement
- Pre-existing Capability Preservation
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.