AxDafny: Agentic Verified Code Generation in Dafny

2026-06-30 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

AxDafny is a novel verifier-guided repair framework designed for agentic code generation in Dafny, where models must produce both executable code and proof artifacts for formal verification. This system iteratively generates implementations, invariants, assertions, and termination arguments to achieve verified code. Researchers introduced LiveCodeBench-Pro-Dafny (LCB-Pro-Dafny), a new benchmark comprising 250 competition-style programming problems translated into Dafny with formal specifications and a verifier-based evaluation harness. On LCB-Pro-Dafny, AxDafny significantly enhances verification success compared to baseline GPT-5.5 performance. Furthermore, AxDafny achieves an impressive 92.7% verification success rate on DafnyBench, surpassing the strongest previously reported proof-hint baseline by 6.5 percentage points. The study also highlights that verification success and runtime test performance evaluate distinct aspects of generated code.

Key takeaway

For Machine Learning Engineers developing code generation agents for critical applications, you should integrate formal verification tools and iterative repair mechanisms into your workflow. This approach, exemplified by AxDafny's success on DafnyBench with 92.7% verification, ensures higher code correctness beyond mere runtime performance. Consider building verifier-guided feedback loops to iteratively refine generated implementations, invariants, and proofs, significantly enhancing the trustworthiness of your AI-generated code.

Key insights

Iterative, verifier-guided repair significantly improves agentic code generation for formal verification in Dafny.

Principles

Iterative verifier feedback improves code verification.
Formal verification and runtime performance measure distinct code qualities.
Agentic models can generate both code and proof artifacts.

Method

AxDafny employs a verifier-guided repair framework that iteratively generates Dafny implementations, invariants, assertions, and termination arguments to achieve verified code.

In practice

Integrate verifier feedback into AI code generation workflows.
Create specialized benchmarks for formally verified code.

Topics

AxDafny
Agentic Code Generation
Formal Verification
Dafny Programming Language
Code Benchmarking
Program Synthesis

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.