VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Automated Reasoning & Formal Methods · Depth: Expert, quick

Summary

VERITAS is a zero-shot framework designed to enhance LLM-based formal theorem proving by integrating comprehensive verifier signals into the proof search process. Unlike traditional methods that simplify verifier feedback to a binary pass/fail, VERITAS routes all signals back through a two-phase protocol. The first phase involves Best-of-N sampling, followed by a second phase featuring a critic-guided Monte Carlo Tree Search (MCTS) that explicitly incorporates failures from Phase 1 as negative examples. This approach significantly improves performance, achieving 40.6% on miniF2F, surpassing an independently run Best-of-5 at 36.9% and Portfolio at 26.2%. On the newly released VERITAS-CombiBench, a 55-theorem combinatorics benchmark, VERITAS scores 7.3%, outperforming Best-of-5 (1.8%) and Portfolio (3.6%), demonstrating its effectiveness in scenarios requiring iterative lemma name recovery. Artifacts are available on GitHub.

Key takeaway

For AI Scientists and Machine Learning Engineers developing formal theorem provers, VERITAS demonstrates that fully utilizing verifier signals significantly boosts performance. If you are struggling with LLM-based provers that collapse feedback, consider implementing a multi-phase search protocol that explicitly incorporates failed attempts as negative examples. This approach can improve proof success rates, especially for complex problems requiring iterative lemma discovery, as shown by its 40.6% on miniF2F.

Key insights

VERITAS improves LLM theorem proving by integrating all verifier signals into a two-phase, feedback-driven search.

Principles

Method

VERITAS employs Best-of-N sampling, then a critic-guided MCTS pass. This MCTS phase ingests Phase 1 failures as explicit negative examples to refine proof search.

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.