InfoSFT: Learn More and Forget Less with Information-Aware Token Weighting

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

InfoSFT is a novel weighting scheme designed to improve supervised fine-tuning (SFT) of large language models (LLMs) by focusing learning signals on maximally informative tokens. Standard SFT often overfits to low-likelihood samples, leading to policy shifts and degradation of prior capabilities. While existing methods filter or down-weight such data, they risk suppressing novel behaviors. InfoSFT addresses this by concentrating training updates on medium-confidence tokens, which are neither overly familiar nor too unlikely to cause instability. This method requires only a one-line modification to the standard token-wise loss and has shown improved generalization over vanilla SFT and likelihood-weighted baselines across math, code, and chain-of-thought tasks, while also better preserving pre-existing model capabilities.

Key takeaway

For AI Engineers and Research Scientists developing or fine-tuning LLMs, integrating InfoSFT into your SFT pipeline can significantly improve model generalization and stability. By focusing learning on optimally informative tokens, you can achieve better performance across tasks like math, code, and chain-of-thought, while simultaneously preserving the model's pre-existing capabilities more effectively than with standard SFT or likelihood-weighted approaches. Consider this one-line modification to enhance your LLM training outcomes.

Key insights

InfoSFT improves LLM fine-tuning by weighting tokens based on informativeness, balancing novelty and stability.

Principles

Method

InfoSFT modifies the standard token-wise SFT loss with a principled weighting scheme that prioritizes maximally informative, medium-confidence tokens to enhance generalization and stability.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.