Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory
Summary
Lean4Agent is presented as the first framework to apply dependent-type formal language, specifically Lean4, for uniformly modeling and verifying Large Language Model (LLM) agent workflows and execution trajectories. It comprises FormalAgentLib, an extensible Lean4 library designed to formally model and verify agent workflows' semantic consistency under explicit assumptions and localize execution-time failures. Building on this, LeanEvolve refines workflows using verification feedback and optional environment signals. Extensive experiments on a hard problem subset of SWE-Bench-Verified (Jimenez et al., 2024) and a subset of ELAIP-Bench (Dai et al., 2025), across five leading LLMs including GPT-5.2 (OpenAI, 2025) and GLM-5 (Zeng et al., 2026), demonstrated significant improvements. Verification-passing workflows outperformed failing ones by an average of 11.94%, and LeanEvolve further boosted SWE performance by an average of 7.47%.
Key takeaway
For AI Architects and Machine Learning Engineers deploying LLM agents in high-stakes domains, you should consider integrating formal verification frameworks like Lean4Agent. This approach, which utilizes dependent-type languages, demonstrably improves workflow reliability and task performance by identifying structural and semantic inconsistencies before deployment and guiding runtime refinements. Implementing such a system can reduce debugging cycles and enhance the trustworthiness of your autonomous agent systems.
Key insights
Lean4Agent uses dependent-type formal language (Lean4) to verify and refine LLM agent workflows and trajectories.
Principles
- Formal methods enhance LLM agent reliability.
- Dependent-type languages model complex agent behaviors.
- Verification-guided refinement improves workflow performance.
Method
Lean4Agent employs FormalAgentLib for three-layer verification (structural, semantic, trajectory) using Lean4. It then uses LeanEvolve, a dual-mode approach, to refine workflows based on verification diagnostics and environment feedback.
In practice
- Use Lean4 to define agent workflow types.
- Apply Hoare-style contracts for semantic checks.
- Localize LLM agent failures via trajectory analysis.
Topics
- LLM Agents
- Formal Verification
- Dependent Type Theory
- Lean4
- Workflow Automation
- Software Engineering Benchmarks
Code references
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.