Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Lean4Agent is presented as the first framework to apply dependent-type formal language, specifically Lean4, for uniformly modeling and verifying Large Language Model (LLM) agent workflows and execution trajectories. It comprises FormalAgentLib, an extensible Lean4 library designed to formally model and verify agent workflows' semantic consistency under explicit assumptions and localize execution-time failures. Building on this, LeanEvolve refines workflows using verification feedback and optional environment signals. Extensive experiments on a hard problem subset of SWE-Bench-Verified (Jimenez et al., 2024) and a subset of ELAIP-Bench (Dai et al., 2025), across five leading LLMs including GPT-5.2 (OpenAI, 2025) and GLM-5 (Zeng et al., 2026), demonstrated significant improvements. Verification-passing workflows outperformed failing ones by an average of 11.94%, and LeanEvolve further boosted SWE performance by an average of 7.47%.

Key takeaway

For AI Architects and Machine Learning Engineers deploying LLM agents in high-stakes domains, you should consider integrating formal verification frameworks like Lean4Agent. This approach, which utilizes dependent-type languages, demonstrably improves workflow reliability and task performance by identifying structural and semantic inconsistencies before deployment and guiding runtime refinements. Implementing such a system can reduce debugging cycles and enhance the trustworthiness of your autonomous agent systems.

Key insights

Lean4Agent uses dependent-type formal language (Lean4) to verify and refine LLM agent workflows and trajectories.

Principles

Method

Lean4Agent employs FormalAgentLib for three-layer verification (structural, semantic, trajectory) using Lean4. It then uses LeanEvolve, a dual-mode approach, to refine workflows based on verification diagnostics and environment feedback.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.