Pythagoras-Prover: Advancing Efficient Formal Proving via Augmented Lean Formalisation

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

Pythagoras-Prover is a new open-source family of compute-efficient Lean theorem provers, featuring 4B and 32B parameter autoregressive models and a 4B diffusion-based model. It employs a compute-frugal data pipeline, including a stratified 800K Lean-verified corpus for curriculum supervised fine-tuning and dynamic proof-reasoning filtering. A key innovation, Augmented Lean Formalisation (ALF), expands the corpus by approximately 2.5 times through structured mutations and self-distillation, bypassing costly per-instance verification. Empirically, Pythagoras-Prover-4B outperforms DeepSeek-Prover-V2-671B on MiniF2F-Test at pass@32 (86.1% vs 82.4%), despite being 167 times smaller. The 32B model achieves 93.0% on MiniF2F-Test and solves 93 problems on PutnamBench, establishing a new open-source benchmark. The diffusion model, while 2.58 times faster, shows 63.25% accuracy.

Key takeaway

For Machine Learning Engineers developing theorem provers, you should prioritize data pipeline innovations over solely scaling model parameters. Implement curriculum learning with difficulty-stratified data and explore structured data augmentation like ALF to significantly boost performance on Lean proofs. This approach enables smaller models to surpass larger baselines, making advanced formal reasoning more accessible and computationally efficient for your projects.

Key insights

Compute-efficient Lean theorem proving is achievable through innovative data augmentation and training strategies, challenging reliance on frontier-scale models.

Principles

Data quality and structure can substitute for raw model scale.
Progressive skill acquisition improves proof generation.
Perturbing formal statements reveals model robustness.

Method

Pythagoras-Prover uses curriculum supervised fine-tuning on a stratified Lean-verified corpus, dynamic proof-reasoning filtering, and Augmented Lean Formalisation (ALF) for corpus expansion via self-distillation.

In practice

Stratify training data by difficulty for progressive learning.
Use statement mutation to expand verified corpora efficiently.
Evaluate provers on perturbed benchmarks to assess robustness.

Topics

Lean Theorem Proving
Automated Theorem Proving
Augmented Lean Formalisation
Diffusion Models
Curriculum Learning
MiniF2F-ALF Benchmark

Code references

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.