Achieving Gold-Medal-Level Olympiad Reasoning via Simple and Unified Scaling

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new reasoning model, SU-01, achieves gold-medal-level performance on International Mathematical Olympiad (IMO) and International Physics Olympiad (IPhO) problems. This 30B-A3B backbone model is trained using a unified recipe that converts a post-trained reasoning backbone into a rigorous olympiad-level solver. The training involves a reverse-perplexity curriculum for Supervised Fine-Tuning (SFT) on approximately 340K sub-8K-token trajectories, followed by a two-stage Reinforcement Learning (RL) pipeline with 200 RL steps. This process instills rigorous proof-search and self-checking behaviors, which are then scaled from verifiable rewards to proof-level RL. The model also utilizes test-time scaling to boost solving performance, enabling stable reasoning on complex problems with trajectories over 100K tokens and demonstrating strong generalization to scientific domains beyond its core training.

Key takeaway

For AI Engineers developing advanced reasoning systems, this unified recipe offers a clear path to achieving expert-level performance in complex problem-solving domains. You should consider integrating a reverse-perplexity SFT curriculum and a two-stage RL pipeline to instill robust proof-search and self-checking capabilities, especially for models tackling long-horizon scientific or mathematical challenges. This approach can significantly enhance model stability and generalization, crucial for competitive benchmarks like Olympiads.

Key insights

A unified recipe scales reasoning models to gold-medal Olympiad performance via SFT and two-stage RL.

Principles

Method

The recipe involves SFT with a reverse-perplexity curriculum, followed by a two-stage RL pipeline progressing from verifiable rewards to proof-level RL, and finally applying test-time scaling.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.