LeanMarathon: Toward Reliable AI Co-Mathematicians through Long-Horizon Lean Autoformalization

2026-06-01 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences, Software Development & Engineering · Depth: Expert, extended

Summary

LeanMarathon is a multi-agent harness designed for reliable, long-horizon autoformalization of research mathematics in Lean 4. It addresses challenges like statement drift and dependency tangles in large-scale formalization by employing an "evolving blueprint" that functions as a formal proof skeleton, natural-language proof graph, and shared system of record. Four contract-scoped agents—Blueprinter, Target-Reviewer, Worker, and Refiner—collaborate under a two-stage orchestrator that first stabilizes target fidelity through adversarial review, then discharges the proof DAG from its dynamic leaves upward in parallel, CI-gated rounds. This system successfully formalized all seven target theorems across two research papers covering four Erdős problems (#1051, #1196, #164, #1217), proving 258 lemmas and theorems. Total costs ranged from \$189 to \$624 per run, significantly outperforming a commercial single-agent baseline.

Key takeaway

For AI Architects designing systems for long-horizon formal verification, you should prioritize multi-agent harness designs over monolithic approaches. LeanMarathon demonstrates that decomposing complex tasks into contract-scoped agents with deterministic CI gates prevents goal drift, context rot, and compute exhaustion, enabling reliable formalization of entire research papers. Implement external verification and bounded agent scopes to ensure your systems remain coherent and recoverable across extended operations.

Key insights

Long-horizon autoformalization requires durable multi-agent harnesses with fault containment to ensure reliability and prevent drift.

Principles

Decompose with dynamic proof DAGs that evolve.
Verify externally or deterministically, not via self-assessment.
Restrict agent tool scope to bounded regions.

Method

A multi-agent harness uses an evolving blueprint and a two-stage orchestrator for adversarial target review, then parallel, CI-gated proof discharge from dynamic DAG leaves.

In practice

Formalize complex research papers using multi-agent systems.
Extend existing formalizations incrementally with new targets.
Employ CI gates for structural proof verification.

Topics

Lean 4
Autoformalization
Multi-agent Systems
Formal Verification
Erdős Problems
Agent Durability

Code references

Best for: AI Scientist, Research Scientist, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.