Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
Summary
Formal Conjectures is a new, evolving benchmark comprising 2615 mathematical problem statements formalized in Lean 4, designed to evaluate advanced automated reasoning systems. This dataset includes 1029 open research conjectures, providing a zero-contamination benchmark for mathematical proof discovery, and 836 solved problems for proof autoformalization. Sourced from active mathematical research, the repository facilitates collaboration between mathematicians and AI systems. The benchmark has already led to new mathematical discoveries, including the resolution of open research conjectures. The project emphasizes correctness through a collaborative open-source model, where AI-generated proofs and disproofs act as an auditing mechanism. A standardized evaluation setup and baseline results on frozen subsets are provided to measure progress in automated reasoning on research-level mathematics.
Key takeaway
For AI Scientists developing automated reasoning systems, this benchmark offers a critical resource for evaluating capabilities on research-level mathematics. You should integrate Formal Conjectures into your evaluation pipelines to measure progress and identify areas for improvement in proof discovery and autoformalization, leveraging its zero-contamination open conjectures.
Key insights
Formal Conjectures offers a Lean 4 benchmark for evaluating automated reasoning on research-level mathematics.
Principles
- Zero-contamination benchmarks are crucial for proof discovery.
- AI-generated proofs can audit formalization correctness.
Method
The project uses a collaborative open-source framework where mathematicians formalize problems and AI systems attempt solutions, with AI outputs iteratively improving benchmark fidelity.
In practice
- Use Lean 4 for formalizing mathematical problems.
- Integrate AI systems for auditing formalizations.
Topics
- Formal Conjectures
- Automated Reasoning Systems
- Lean 4
- Mathematical Proof Discovery
- Proof Autoformalization
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.