Formal Conjectures: An Open and Evolving Benchmark for Verified Discovery in Mathematics

· Source: Takara TLDR - Daily AI Papers · Field: Science & Research — Mathematics & Computational Sciences, Automated Reasoning · Depth: Expert, medium

Summary

Formal Conjectures is a new, evolving benchmark designed to evaluate automated reasoning systems on research-level mathematical problems. The dataset currently comprises 2615 mathematical problem statements formalized in Lean 4, sourced from active mathematical research areas. It includes 1029 open research conjectures, providing a zero-contamination benchmark for proof discovery, and 836 solved problems for proof autoformalization. The repository establishes a structured interface to connect mathematicians with AI systems and humans working on solutions. This benchmark has already facilitated new mathematical discoveries, including the resolution of open conjectures. The project emphasizes a collaborative, open-source approach to ensure formalization correctness, with AI-generated proofs serving as an auditing mechanism. Standardized evaluation setups and baseline results are provided, demonstrating a measurable signal for progress in automated reasoning.

Key takeaway

For AI scientists developing automated reasoning systems, Formal Conjectures offers a critical, contamination-free benchmark to measure progress on unsolved mathematical problems. You should integrate this Lean 4-based dataset into your evaluation pipelines to accurately assess your system's capabilities in proof discovery and autoformalization, ensuring your models are tested against genuine research challenges.

Key insights

Formal Conjectures is an open, evolving benchmark for evaluating automated reasoning on research-level mathematics using Lean 4.

Principles

Method

The benchmark formalizes mathematical problems in Lean 4, categorizing them into open conjectures for discovery and solved problems for autoformalization, with a structured interface for collaboration and iterative improvement via AI auditing.

In practice

Topics

Code references

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.