Proof-Refactor: Refactoring Generated Formal Proofs into Modular Artifacts
Summary
The Proof-Refactor framework, an agentic system, addresses the challenge of improving the readability, modularity, maintainability, and reusability of formal proofs generated by Large Language Models (LLMs). While LLMs excel at proof generation, their outputs often fall short of the quality found in mature formal mathematics libraries, partly due to a "compile-first" objective that favors monolithic scripts. Unlike existing approaches that rely on length-based optimization metrics, Proof-Refactor adopts a process-guided methodology inspired by human proof-refactoring workflows. It decomposes refactoring into four distinct phases: extracting candidate proof fragments, designing helper declarations, formally proving the extracted and designed components, and repairing the original proof using these verified components. Evaluated on generated Lean proofs from PutnamBench and Putnam2025, Proof-Refactor demonstrated enhanced rubric-based refactoring scores compared to a Claude Code refactoring baseline, with notable improvements in signature quality and human readability.
Key takeaway
For research scientists developing LLM-based proof generation systems, you should integrate process-guided refactoring frameworks like Proof-Refactor into your pipelines. Relying solely on length-based metrics for proof quality is insufficient; instead, prioritize modularity, readability, and reusability. Consider adopting multi-phase agentic approaches to decompose complex refactoring tasks, leading to higher-quality, library-ready formal proofs that are more maintainable and reusable in practice.
Key insights
LLM-generated proofs benefit from process-guided refactoring to enhance modularity and readability beyond simple length optimization.
Principles
- LLM proof quality needs more than length metrics.
- Process-guided refactoring improves proof structure.
- Human refactoring workflows offer a model.
Method
Proof-Refactor decomposes refactoring into four phases: fragment extraction, helper declaration design, component formal proving, and original proof repair using verified components.
In practice
- Apply agentic frameworks for proof refinement.
- Prioritize modularity over proof length.
- Use rubric-based scoring for quality.
Topics
- Large Language Models
- Formal Proofs
- Proof Refactoring
- Agentic AI
- Lean
- PutnamBench
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.