LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization
Summary
LeanMarathon is a multi-agent harness designed for reliable, long-horizon autoformalization of research mathematics in Lean 4. It addresses challenges like statement drift and dependency tangles in large-scale formalization by employing an "evolving blueprint" that functions as a formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents—Blueprinter, Target-Reviewer, Worker, and Refiner—collaborate under a two-stage orchestrator that first stabilizes target fidelity through adversarial review, then discharges the proof DAG from its dynamic leaves upward in parallel, CI-gated rounds. This system successfully formalized all seven target theorems across two research papers covering four Erdős problems (#1051, #1196, #164, #1217), proving 258 lemmas and theorems. Total costs ranged from \$189 to \$624 per run, significantly outperforming a commercial single-agent baseline.
Key takeaway
For AI Architects designing systems for long-horizon formal verification, you should prioritize multi-agent harness designs over monolithic approaches. LeanMarathon demonstrates that decomposing complex tasks into contract-scoped agents with deterministic CI gates prevents goal drift, context rot, and compute exhaustion, enabling reliable formalization of entire research papers. Implement external verification and bounded agent scopes to ensure your systems remain coherent and recoverable across extended operations.
Key insights
Long-horizon autoformalization requires durable multi-agent harnesses with fault containment to ensure reliability and prevent drift.
Principles
- Decompose with dynamic proof DAGs that evolve.
- Verify externally or deterministically, not via self-assessment.
- Restrict agent tool scope to bounded regions.
Method
A multi-agent harness uses an evolving blueprint and a two-stage orchestrator for adversarial target review, then parallel, CI-gated proof discharge from dynamic DAG leaves.
In practice
- Formalize complex research papers using multi-agent systems.
- Extend existing formalizations incrementally with new targets.
- Employ CI gates for structural proof verification.
Topics
- Lean 4
- Autoformalization
- Multi-agent Systems
- Formal Verification
- Erdős Problems
- Agent Durability
Code references
Best for: AI Scientist, Research Scientist, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.