VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving
Summary
VERITAS is a zero-shot framework designed to enhance LLM-based formal theorem proving by integrating comprehensive verifier signals into the proof search process. Unlike traditional methods that simplify verifier feedback to a binary pass/fail, VERITAS routes all signals back through a two-phase protocol. The first phase involves Best-of-N sampling, followed by a second phase featuring a critic-guided Monte Carlo Tree Search (MCTS) that explicitly incorporates failures from Phase 1 as negative examples. This approach significantly improves performance, achieving 40.6% on miniF2F, surpassing an independently run Best-of-5 at 36.9% and Portfolio at 26.2%. On the newly released VERITAS-CombiBench, a 55-theorem combinatorics benchmark, VERITAS scores 7.3%, outperforming Best-of-5 (1.8%) and Portfolio (3.6%), demonstrating its effectiveness in scenarios requiring iterative lemma name recovery. Artifacts are available on GitHub.
Key takeaway
For AI Scientists and Machine Learning Engineers developing formal theorem provers, VERITAS demonstrates that fully utilizing verifier signals significantly boosts performance. If you are struggling with LLM-based provers that collapse feedback, consider implementing a multi-phase search protocol that explicitly incorporates failed attempts as negative examples. This approach can improve proof success rates, especially for complex problems requiring iterative lemma discovery, as shown by its 40.6% on miniF2F.
Key insights
VERITAS improves LLM theorem proving by integrating all verifier signals into a two-phase, feedback-driven search.
Principles
- Integrate all verifier signals.
- Use failed attempts as negative examples.
- Guided search outperforms unguided sampling.
Method
VERITAS employs Best-of-N sampling, then a critic-guided MCTS pass. This MCTS phase ingests Phase 1 failures as explicit negative examples to refine proof search.
In practice
- Apply two-phase search to LLM tasks.
- Use verifier feedback for iterative refinement.
- Explore MCTS with negative examples.
Topics
- Formal Theorem Proving
- LLM-based Provers
- Verifier-Guided Search
- Monte Carlo Tree Search
- miniF2F Benchmark
- VERITAS-CombiBench
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.