Why Does RL Generalize Better Than SFT? A Data-Centric Perspective on VLM Post-Training

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, medium

Summary

Large-scale Vision-Language Models (VLMs) post-trained with Reinforcement Learning (RL) exhibit superior out-of-distribution (OOD) generalization compared to those trained with Supervised Fine-Tuning (SFT). This phenomenon, observed on February 11, 2026, is attributed to RL's implicit data filtering, which prioritizes medium-difficulty training samples. Research systematically evaluated SFT models across varying data difficulty levels, confirming that training on hard samples significantly degrades OOD performance. Based on this finding, a new method called Difficulty-Curated SFT (DC-SFT) was introduced. DC-SFT explicitly filters training data by sample difficulty, demonstrating substantial improvements in OOD generalization over standard SFT and even surpassing RL-based training, while offering enhanced stability and computational efficiency. Code for DC-SFT is available on GitHub.

Key takeaway

For research scientists and VLM engineers optimizing model generalization, consider implementing Difficulty-Curated SFT (DC-SFT) in your post-training pipeline. This method, which explicitly filters training data by difficulty, offers a more stable and computationally efficient path to superior out-of-distribution performance than traditional SFT or even RL-based approaches. You should evaluate your training data's difficulty distribution and curate it to prioritize medium-difficulty samples.

Key insights

RL's OOD generalization advantage in VLMs stems from implicitly prioritizing medium-difficulty training data.

Principles

Method

Difficulty-Curated SFT (DC-SFT) explicitly filters VLM training data based on sample difficulty to enhance OOD generalization, outperforming standard SFT and RL.

In practice

Topics

Code references

Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.