SFT-GRPO Data Overlap as a Post-Training Hyperparameter for Autoformalization

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A controlled study investigated the impact of data overlap between Supervised Fine-Tuning (SFT) and Group Relative Policy Optimization (GRPO) stages on autoformalization models. Using a Qwen3-8B model, researchers evaluated six training configurations for Lean 4 autoformalization, varying the overlap of GRPO prompts with the SFT corpus at 0%, 30%, and 100%. The study found that lower overlap consistently led to higher compilation and semantic accuracy on Gaokao-Formal and PutnamBench. Specifically, 0% overlap yielded a 10.4 percentage-point semantic gain over SFT alone on Gaokao, while 100% overlap rendered the GRPO stage redundant. The research also highlighted significant compile-semantic gaps, exceeding 30 percentage points for high-compiling models, which are undetectable with compile-only evaluation.

Key takeaway

For AI Engineers developing autoformalization systems, explicitly managing data overlap between SFT and GRPO stages is critical. You should prioritize constructing disjoint data pools for GRPO training when possible, as this strategy consistently improves both compilation and semantic accuracy without additional computational cost. Furthermore, always employ dual-metric evaluation (compile pass@k and semantic pass@k) to avoid overlooking significant semantic inaccuracies in high-compiling models.

Key insights

Disjoint SFT and GRPO data pools significantly enhance autoformalization model performance without extra compute.

Principles

Method

The study used a dual-stage reward function for GRPO, combining a compiler gate with a continuous semantic judge (Gemini Flash 3) to provide indirect semantic supervision for Lean 4 autoformalization.

In practice

Topics

Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.