Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

2026-05-03 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Lean4Agent is presented as the first framework to apply dependent-type formal language, specifically Lean4, for uniformly modeling and verifying Large Language Model (LLM) agent workflows and execution trajectories. It comprises FormalAgentLib, an extensible Lean4 library designed to formally model and verify agent workflows' semantic consistency under explicit assumptions and localize execution-time failures. Building on this, LeanEvolve refines workflows using verification feedback and optional environment signals. Extensive experiments on a hard problem subset of SWE-Bench-Verified (Jimenez et al., 2024) and a subset of ELAIP-Bench (Dai et al., 2025), across five leading LLMs including GPT-5.2 (OpenAI, 2025) and GLM-5 (Zeng et al., 2026), demonstrated significant improvements. Verification-passing workflows outperformed failing ones by an average of 11.94%, and LeanEvolve further boosted SWE performance by an average of 7.47%.

Key takeaway

For AI Architects and Machine Learning Engineers deploying LLM agents in high-stakes domains, you should consider integrating formal verification frameworks like Lean4Agent. This approach, which utilizes dependent-type languages, demonstrably improves workflow reliability and task performance by identifying structural and semantic inconsistencies before deployment and guiding runtime refinements. Implementing such a system can reduce debugging cycles and enhance the trustworthiness of your autonomous agent systems.

Key insights

Lean4Agent uses dependent-type formal language (Lean4) to verify and refine LLM agent workflows and trajectories.

Principles

Formal methods enhance LLM agent reliability.
Dependent-type languages model complex agent behaviors.
Verification-guided refinement improves workflow performance.

Method

Lean4Agent employs FormalAgentLib for three-layer verification (structural, semantic, trajectory) using Lean4. It then uses LeanEvolve, a dual-mode approach, to refine workflows based on verification diagnostics and environment feedback.

In practice

Use Lean4 to define agent workflow types.
Apply Hoare-style contracts for semantic checks.
Localize LLM agent failures via trajectory analysis.

Topics

LLM Agents
Formal Verification
Dependent Type Theory
Lean4
Workflow Automation
Software Engineering Benchmarks

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.