Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation
Summary
Pythagoras-Prover is a new open-source family of compute-efficient Lean theorem provers, featuring 4B and 32B parameter autoregressive models and a 4B diffusion-based model. It employs a compute-frugal data pipeline, including a stratified 800K Lean-verified corpus for curriculum supervised fine-tuning and dynamic proof-reasoning filtering. A key innovation, Augmented Lean Formalisation (ALF), expands the corpus by approximately 2.5 times through structured mutations and self-distillation, bypassing costly per-instance verification. Empirically, Pythagoras-Prover-4B outperforms DeepSeek-Prover-V2-671B on MiniF2F-Test at pass@32 (86.1% vs 82.4%), despite being 167 times smaller. The 32B model achieves 93.0% on MiniF2F-Test and solves 93 problems on PutnamBench, establishing a new open-source benchmark. The diffusion model, while 2.58 times faster, shows 63.25% accuracy.
Key takeaway
For Machine Learning Engineers developing theorem provers, you should prioritize data pipeline innovations over solely scaling model parameters. Implement curriculum learning with difficulty-stratified data and explore structured data augmentation like ALF to significantly boost performance on Lean proofs. This approach enables smaller models to surpass larger baselines, making advanced formal reasoning more accessible and computationally efficient for your projects.
Key insights
Compute-efficient Lean theorem proving is achievable through innovative data augmentation and training strategies, challenging reliance on frontier-scale models.
Principles
- Data quality and structure can substitute for raw model scale.
- Progressive skill acquisition improves proof generation.
- Perturbing formal statements reveals model robustness.
Method
Pythagoras-Prover uses curriculum supervised fine-tuning on a stratified Lean-verified corpus, dynamic proof-reasoning filtering, and Augmented Lean Formalisation (ALF) for corpus expansion via self-distillation.
In practice
- Stratify training data by difficulty for progressive learning.
- Use statement mutation to expand verified corpora efficiently.
- Evaluate provers on perturbed benchmarks to assess robustness.
Topics
- Lean Theorem Proving
- Automated Theorem Proving
- Augmented Lean Formalisation
- Diffusion Models
- Curriculum Learning
- MiniF2F-ALF Benchmark
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.