Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics
Summary
Formal Conjectures is a new, evolving benchmark designed to evaluate automated reasoning systems on research-level mathematical problems. The dataset currently comprises 2615 mathematical problem statements formalized in Lean 4, sourced from active mathematical research areas. It includes 1029 open research conjectures, providing a zero-contamination benchmark for proof discovery, and 836 solved problems for proof autoformalization. The repository establishes a structured interface to connect mathematicians with AI systems and humans working on solutions. This benchmark has already facilitated new mathematical discoveries, including the resolution of open conjectures. The project emphasizes a collaborative, open-source approach to ensure formalization correctness, with AI-generated proofs serving as an auditing mechanism. Standardized evaluation setups and baseline results are provided, demonstrating a measurable signal for progress in automated reasoning.
Key takeaway
For AI scientists developing automated reasoning systems, Formal Conjectures offers a critical, contamination-free benchmark to measure progress on unsolved mathematical problems. You should integrate this Lean 4-based dataset into your evaluation pipelines to accurately assess your system's capabilities in proof discovery and autoformalization, ensuring your models are tested against genuine research challenges.
Key insights
Formal Conjectures is an open, evolving benchmark for evaluating automated reasoning on research-level mathematics using Lean 4.
Principles
- Zero-contamination benchmarks are crucial for novel discovery.
- Community collaboration enhances formalization correctness.
- AI-generated proofs can audit benchmark fidelity.
Method
The benchmark formalizes mathematical problems in Lean 4, categorizing them into open conjectures for discovery and solved problems for autoformalization, with a structured interface for collaboration and iterative improvement via AI auditing.
In practice
- Use Formal Conjectures for AI proof discovery.
- Contribute formalizations to the open-source project.
- Evaluate automated reasoning systems with the benchmark.
Topics
- Formal Conjectures Benchmark
- Automated Reasoning
- Mathematical Discovery
- Lean 4 Formalization
- Proof Autoformalization
Code references
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.