Planning to Hammer: Difficulty-Aware Decomposition for Automating Rocq Proofs

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

Quarry, a novel planning-based proof synthesis framework, significantly enhances automated formal verification for Rocq (formerly Coq) proofs. Introduced in 2026, Quarry addresses the limitations of existing methods by integrating large language models (LLMs) for high-level proof decomposition with CoqHammer for reliable local goal execution. The system operates via a "Generate–Rank–Solve" loop: LLMs propose multiple proof decompositions, which are type-checked in Rocq, then ranked by a difficulty model that estimates CoqHammer solvability based on proof-state features. Quarry then recursively proves sublemmas within a bounded budget, transforming long proofs into sequences of hammer-solvable obligations. Evaluated on CoqGym100, Wigderson100, and TransBench58 benchmarks, Quarry achieved success rates of 55%, 52%, and 16% respectively. This represents a 7%–13% improvement over the strongest baseline under a uniform 10-minute wall-clock budget, demonstrating effective coordination of neural planning and symbolic execution.

Key takeaway

For AI Engineers developing automated theorem provers for Rocq, Quarry's planning-based framework offers a significant advancement. You should consider integrating LLM-driven decomposition with difficulty-aware ranking and symbolic execution to improve proof automation. This approach, which separates planning from execution and prioritizes hammer-solvable subgoals, can achieve higher success rates and more predictable costs than reactive, monolithic methods. Explore adapting this Generate–Rank–Solve loop to your specific ITP environment.

Key insights

Quarry coordinates LLM-driven proof planning with symbolic execution to automate Rocq proofs, improving success rates.

Principles

Method

Quarry employs a Generate–Rank–Solve loop: LLMs propose decompositions, a difficulty model ranks them by estimated solvability, then CoqHammer recursively solves sublemmas within a budget.

In practice

Topics

Code references

Best for: AI Scientist, AI Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.