OProver: A Unified Framework for Agentic Formal Theorem Proving
Summary
OProver is a new unified framework for agentic formal theorem proving in Lean 4, integrating iterative proof revision directly into the training process rather than as an inference-time heuristic. It uses retrieved compiler-verified proofs and Lean compiler feedback to refine failed proof attempts. The framework is trained through continued pretraining on Lean code and mathematics, followed by iterative post-training that includes agentic proving, supervised fine-tuning (SFT) on repair trajectories, and reinforcement learning (RL) on unresolved cases. OProver is paired with OProofs, a large-scale corpus containing 1.77M Lean statements, 6.86M compiler-verified proofs, and serialized proving trajectories with retrieved context, failed attempts, feedback, and repairs. OProver-32B achieved state-of-the-art Pass@32 scores on MiniF2F (93.3%), ProverBench (58.2%), and PutnamBench (11.3%), and ranked second on MathOlympiad (22.8%) and ProofNet (33.2%), outperforming prior open-weight whole-proof provers.
Key takeaway
For AI Scientists and Machine Learning Engineers developing formal theorem provers, OProver demonstrates that integrating multi-round, feedback-conditioned refinement directly into the training loop, rather than as a post-hoc augmentation, yields superior performance. You should focus on building systems that learn from iterative repair trajectories and compiler feedback, as this approach significantly boosts success rates across diverse mathematical benchmarks. Consider developing evolving corpora that grow with your prover's capabilities.
Key insights
Integrating agentic proving and compiler feedback directly into training significantly enhances formal theorem proving performance.
Principles
- Iterative refinement improves proof success rates.
- Compiler feedback is a rich supervision source.
- Co-evolution of prover and corpus strengthens both.
Method
OProver trains a policy to iteratively revise proofs using retrieved context and Lean 4 compiler feedback, employing continued pretraining, SFT on repair trajectories, and RL on hard cases, with newly verified proofs expanding the corpus.
In practice
- Use multi-round refinement for complex proofs.
- Incorporate compiler diagnostics for targeted repair.
- Expand training data with agentic proving traces.
Topics
- Formal Theorem Proving
- Lean 4
- Agentic Proving
- OProver
- OProofs Corpus
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.