Discover and Prove: An Open-source Agentic Framework for Hard Mode Automated Theorem Proving in Lean 4
Summary
A new open-source agentic framework, Discover And Prove (DAP), has been released to address the "Hard Mode" challenge in automated theorem proving (ATP) within Lean 4. Unlike "Easy Mode" benchmarks where answers are embedded in formal statements, Hard Mode requires systems to independently discover the answer before constructing a formal proof. To facilitate this, the authors also released MiniF2F-Hard and FIMO-Hard, reannotated Hard Mode variants of existing ATP benchmarks. DAP utilizes large language model (LLM) natural-language reasoning with explicit self-reflection to discover answers, then reformulates Hard Mode statements into Easy Mode for existing ATP provers. DAP achieved a new state of the art on CombiBench, increasing solved problems from 7 (previous Pass@16 SOTA) to 10, and is the first system to formally prove 36 theorems in Hard Mode on PutnamBench.
Key takeaway
For AI Scientists and Machine Learning Engineers developing automated theorem provers, you should consider adopting Hard Mode benchmarks like MiniF2F-Hard and FIMO-Hard. This shift will provide a more realistic assessment of your models' capabilities, particularly in answer discovery, and help identify areas for improvement beyond mere proof generation.
Key insights
Hard Mode ATP requires independent answer discovery before formal proof, revealing a significant gap between LLM answer accuracy and formal prover capability.
Principles
- Explicit self-reflection improves LLM reasoning.
- Hard Mode benchmarks reveal true model capabilities.
Method
DAP uses LLM natural-language reasoning and self-reflection to discover answers, then rewrites Hard Mode problems into Easy Mode for existing ATP provers.
In practice
- Use MiniF2F-Hard for Hard Mode ATP evaluation.
- Apply LLM self-reflection for complex reasoning tasks.
Topics
- Automated Theorem Proving
- Hard Mode Benchmarks
- Discover And Prove
- Large Language Models
- Lean 4
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.