AxDafny: Agentic Verified Code Generation in Dafny

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

AxDafny is a novel verifier-guided repair framework designed for agentic code generation in Dafny, where models must produce both executable code and proof artifacts for formal verification. This system iteratively generates implementations, invariants, assertions, and termination arguments to achieve verified code. Researchers introduced LiveCodeBench-Pro-Dafny (LCB-Pro-Dafny), a new benchmark comprising 250 competition-style programming problems translated into Dafny with formal specifications and a verifier-based evaluation harness. On LCB-Pro-Dafny, AxDafny significantly enhances verification success compared to baseline GPT-5.5 performance. Furthermore, AxDafny achieves an impressive 92.7% verification success rate on DafnyBench, surpassing the strongest previously reported proof-hint baseline by 6.5 percentage points. The study also highlights that verification success and runtime test performance evaluate distinct aspects of generated code.

Key takeaway

For Machine Learning Engineers developing code generation agents for critical applications, you should integrate formal verification tools and iterative repair mechanisms into your workflow. This approach, exemplified by AxDafny's success on DafnyBench with 92.7% verification, ensures higher code correctness beyond mere runtime performance. Consider building verifier-guided feedback loops to iteratively refine generated implementations, invariants, and proofs, significantly enhancing the trustworthiness of your AI-generated code.

Key insights

Iterative, verifier-guided repair significantly improves agentic code generation for formal verification in Dafny.

Principles

Method

AxDafny employs a verifier-guided repair framework that iteratively generates Dafny implementations, invariants, assertions, and termination arguments to achieve verified code.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.