Evaluating the Robustness of Proof Autoformalization in Lean 4
Summary
A new study evaluates the robustness of LLM-based proof autoformalization models, which translate natural language mathematical proofs into formal proofs in languages like Lean 4. Unlike prior work focusing on curated datasets, this research investigates model performance under two categories of perturbations: global and local. Global perturbations involve paraphrasing the informal proof's style, expecting consistent formalization. Local perturbations alter specific values, symbols, or proof steps, requiring the formalization to faithfully reflect these changes. A benchmark was developed using miniF2F and MATH-500 datasets to measure stability under global perturbations and faithfulness under local ones. Evaluation of seven recent models revealed significant sensitivity to global perturbations and a general failure to maintain faithfulness when subjected to local alterations.
Key takeaway
For research scientists developing or deploying LLM-based proof autoformalization systems, you should recognize that current models exhibit significant fragility. Your systems are sensitive to stylistic paraphrasing and often fail to accurately reflect minor changes in proof details. Prioritize developing robustness mechanisms that ensure faithfulness to varied informal proof inputs, rather than relying solely on performance metrics from curated datasets.
Key insights
Current LLM-based proof autoformalization models lack robustness to stylistic changes and specific alterations in informal mathematical proofs.
Principles
- Robust autoformalization demands faithfulness to input variations.
- Models must maintain consistency under global style changes.
- Faithfully reflecting local alterations is a key metric.
Method
Formulate global and local proof perturbations. Build a benchmark on miniF2F and MATH-500. Measure formalization correctness stability under global changes and faithfulness to local alterations.
In practice
- Test autoformalizers with diverse informal proof styles.
- Verify model output against specific local input changes.
Topics
- Proof Autoformalization
- Lean 4
- LLM Robustness
- Mathematical Proofs
- Perturbation Analysis
- miniF2F Dataset
Code references
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.