MaxProof: Scaling Mathematical Proof with Generative-Verifier RL and Population-Level Test-Time Scaling
Summary
MaxProof is a population-level test-time scaling framework designed for competition-level mathematical proof within the MiniMax-M3 series. The M3 model is initially trained with three proof-oriented capabilities: generation, verification, and critique-conditioned repair, utilizing a defense-in-depth generative verifier to ensure a low false-positive rate. During test time, MaxProof leverages this M3 model as a generator, verifier, refiner, and ranker. It systematically searches through a population of candidate proofs and employs tournament selection to identify and return a single final proof. This test-time scaling approach enables the M3 model to achieve scores of 35/42 on IMO 2025 and 36/42 on USAMO 2026, successfully surpassing the human gold-medal threshold for both competitions.
Key takeaway
For AI Scientists developing advanced reasoning systems, MaxProof demonstrates a critical path to surpassing human benchmarks in complex domains like mathematical proof. You should consider integrating generative-verifier reinforcement learning with population-level test-time scaling to enhance model robustness and accuracy. This approach offers a blueprint for achieving gold-medal thresholds, suggesting that combining diverse model capabilities and iterative refinement is key for high-stakes AI applications.
Key insights
MaxProof combines generative-verifier RL with population-level test-time scaling to achieve human gold-medal performance in mathematical proof.
Principles
- Defense-in-depth verification minimizes false positives.
- Population-level search enhances proof quality.
- Tournament selection refines candidate proofs.
Method
Train M3 for generation, verification, and repair using a generative verifier. At test time, use M3 as generator, verifier, refiner, and ranker to search candidate proofs and select the best via tournament.
In practice
- Apply generative-verifier RL for complex reasoning tasks.
- Implement population-level search for robust solution finding.
- Use tournament selection to refine AI-generated outputs.
Topics
- Mathematical Proof
- Generative-Verifier RL
- Population-Level Scaling
- MiniMax-M3 Series
- AI Reasoning
- Automated Theorem Proving
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.