MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling

2026-06-11 · Source: Computation and Language · Field: Science & Research — Mathematics & Computational Sciences, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

MaxProof is a population-level test-time scaling framework designed for competition-level mathematical proof within the MiniMax-M3 series. The M3 model is initially trained with three proof-oriented capabilities: generation, verification, and critique-conditioned repair, utilizing a defense-in-depth generative verifier to ensure a low false-positive rate. During test time, MaxProof leverages this M3 model as a generator, verifier, refiner, and ranker. It systematically searches through a population of candidate proofs and employs tournament selection to identify and return a single final proof. This test-time scaling approach enables the M3 model to achieve scores of 35/42 on IMO 2025 and 36/42 on USAMO 2026, successfully surpassing the human gold-medal threshold for both competitions.

Key takeaway

For AI Scientists developing advanced reasoning systems, MaxProof demonstrates a critical path to surpassing human benchmarks in complex domains like mathematical proof. You should consider integrating generative-verifier reinforcement learning with population-level test-time scaling to enhance model robustness and accuracy. This approach offers a blueprint for achieving gold-medal thresholds, suggesting that combining diverse model capabilities and iterative refinement is key for high-stakes AI applications.

Key insights

MaxProof combines generative-verifier RL with population-level test-time scaling to achieve human gold-medal performance in mathematical proof.

Principles

Defense-in-depth verification minimizes false positives.
Population-level search enhances proof quality.
Tournament selection refines candidate proofs.

Method

Train M3 for generation, verification, and repair using a generative verifier. At test time, use M3 as generator, verifier, refiner, and ranker to search candidate proofs and select the best via tournament.

In practice

Apply generative-verifier RL for complex reasoning tasks.
Implement population-level search for robust solution finding.
Use tournament selection to refine AI-generated outputs.

Topics

Mathematical Proof
Generative-Verifier RL
Population-Level Scaling
MiniMax-M3 Series
AI Reasoning
Automated Theorem Proving

Best for: AI Scientist, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.