AxDafny: Agentic Verified Code Generation in Dafny
Summary
AxDafny is a novel verifier-guided repair framework designed for agentic code generation in Dafny, where models must produce both executable code and proof artifacts for formal verification. This system iteratively generates implementations, invariants, assertions, and termination arguments to achieve verified code. Researchers introduced LiveCodeBench-Pro-Dafny (LCB-Pro-Dafny), a new benchmark comprising 250 competition-style programming problems translated into Dafny with formal specifications and a verifier-based evaluation harness. On LCB-Pro-Dafny, AxDafny significantly enhances verification success compared to baseline GPT-5.5 performance. Furthermore, AxDafny achieves an impressive 92.7% verification success rate on DafnyBench, surpassing the strongest previously reported proof-hint baseline by 6.5 percentage points. The study also highlights that verification success and runtime test performance evaluate distinct aspects of generated code.
Key takeaway
For Machine Learning Engineers developing code generation agents for critical applications, you should integrate formal verification tools and iterative repair mechanisms into your workflow. This approach, exemplified by AxDafny's success on DafnyBench with 92.7% verification, ensures higher code correctness beyond mere runtime performance. Consider building verifier-guided feedback loops to iteratively refine generated implementations, invariants, and proofs, significantly enhancing the trustworthiness of your AI-generated code.
Key insights
Iterative, verifier-guided repair significantly improves agentic code generation for formal verification in Dafny.
Principles
- Iterative verifier feedback improves code verification.
- Formal verification and runtime performance measure distinct code qualities.
- Agentic models can generate both code and proof artifacts.
Method
AxDafny employs a verifier-guided repair framework that iteratively generates Dafny implementations, invariants, assertions, and termination arguments to achieve verified code.
In practice
- Integrate verifier feedback into AI code generation workflows.
- Create specialized benchmarks for formally verified code.
Topics
- AxDafny
- Agentic Code Generation
- Formal Verification
- Dafny Programming Language
- Code Benchmarking
- Program Synthesis
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.