Rethinking Generalization in Reasoning SFT: A Conditional Analysis on Optimization, Data, and Model Capability

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study challenges the common belief that Supervised Fine-Tuning (SFT) primarily leads to memorization, while Reinforcement Learning (RL) fosters generalization in Large Language Models (LLMs). Researchers found that cross-domain generalization in reasoning SFT, particularly with long Chain-of-Thought (CoT) supervision, is conditional. This generalization is influenced by optimization dynamics, training data quality, and the base model's inherent capability. The study observed a "dip-and-recovery" pattern where cross-domain performance initially degrades before improving with extended training, suggesting that short training periods can misrepresent generalization. High-quality, verified long-CoT traces consistently improved cross-domain reasoning, while low-quality solutions were detrimental. Stronger models could internalize transferable procedural patterns, even from simple tasks, unlike weaker models that merely mimicked surface-level verbosity. However, this generalization is asymmetric, improving reasoning but degrading safety.

Key takeaway

For AI Engineers and Research Scientists evaluating LLM post-training strategies, understand that reasoning SFT can achieve cross-domain generalization, but it requires careful consideration of training duration and data quality. Do not prematurely conclude SFT failures based on early training checkpoints, as performance may recover. Focus on curating high-quality, long Chain-of-Thought data and leverage more capable base models to foster robust reasoning generalization, while also monitoring for potential safety degradation.

Key insights

Reasoning SFT can generalize cross-domain, but it is conditional on optimization, data quality, and model capability.

Principles

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.