Planning to Hammer: Difficulty-Aware Decomposition for Automating Rocq Proofs
Summary
Quarry, a novel planning-based proof synthesis framework, significantly enhances automated formal verification for Rocq (formerly Coq) proofs. Introduced in 2026, Quarry addresses the limitations of existing methods by integrating large language models (LLMs) for high-level proof decomposition with CoqHammer for reliable local goal execution. The system operates via a "Generate–Rank–Solve" loop: LLMs propose multiple proof decompositions, which are type-checked in Rocq, then ranked by a difficulty model that estimates CoqHammer solvability based on proof-state features. Quarry then recursively proves sublemmas within a bounded budget, transforming long proofs into sequences of hammer-solvable obligations. Evaluated on CoqGym100, Wigderson100, and TransBench58 benchmarks, Quarry achieved success rates of 55%, 52%, and 16% respectively. This represents a 7%–13% improvement over the strongest baseline under a uniform 10-minute wall-clock budget, demonstrating effective coordination of neural planning and symbolic execution.
Key takeaway
For AI Engineers developing automated theorem provers for Rocq, Quarry's planning-based framework offers a significant advancement. You should consider integrating LLM-driven decomposition with difficulty-aware ranking and symbolic execution to improve proof automation. This approach, which separates planning from execution and prioritizes hammer-solvable subgoals, can achieve higher success rates and more predictable costs than reactive, monolithic methods. Explore adapting this Generate–Rank–Solve loop to your specific ITP environment.
Key insights
Quarry coordinates LLM-driven proof planning with symbolic execution to automate Rocq proofs, improving success rates.
Principles
- Proof planning should precede execution.
- Decompose hard goals into simpler, equally difficult pieces.
- Prioritize plans with hammer-solvable sublemmas.
Method
Quarry employs a Generate–Rank–Solve loop: LLMs propose decompositions, a difficulty model ranks them by estimated solvability, then CoqHammer recursively solves sublemmas within a budget.
In practice
- Use proof-state features for difficulty estimation.
- Aggregate sublemma difficulties using a max function.
- Train difficulty models offline from execution traces.
Topics
- Formal Verification
- Interactive Theorem Proving
- Rocq Proof Automation
- Large Language Models
- Proof Synthesis
- CoqHammer
Code references
Best for: AI Scientist, AI Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.