An Empirical Study of LLM-Generated Specifications for VeriFast

2026-06-25 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

An empirical study thoroughly evaluated the performance of large language models (LLMs) in generating specifications for VeriFast, a separation logic (SL) verifier. This research addressed the significant human labor typically required for SL verifiers, especially for complex auxiliary specifications in heap-manipulating programs. The study involved verifying 303 C functions, exploring eight prompting approaches, ten LLMs, and three input types across two stages. Quantitative and qualitative analyses revealed that LLMs preserved functional behavior in both source code and specifications over 91%, but achieved a modest 31.4% verification success rate. Notably, using Gemini 2.5 Pro and providing formal contracts improved success. The majority of errors, 94%, stemmed from LLMs' lack of domain-specific knowledge concerning SL verifiers like VeriFast, offering crucial insights for optimizing future LLM-generated specifications.

Key takeaway

For Research Scientists developing tools for static verification, particularly with separation logic verifiers, this study indicates that while LLMs can preserve functional behavior in generated specifications, their current verification success rate is modest (31.4%). You should prioritize strategies to enhance LLMs' domain-specific knowledge of SL verifiers. Consider using models like Gemini 2.5 Pro and providing formal contracts as input to improve accuracy and reduce the 94% error rate attributed to domain knowledge gaps.

Key insights

LLMs can generate specifications for separation logic verifiers, but their domain-specific knowledge limitations lead to modest verification success.

Principles

LLMs preserve functional behavior in code and specifications (>91%).
Most LLM errors (94%) stem from SL domain-specific knowledge.
Formal contracts significantly improve LLM specification generation.

Method

The study evaluated LLM-generated specifications for 303 C functions using 8 prompting approaches, 10 LLMs, and 3 input types for an SL verifier like VeriFast.

In practice

Consider Gemini 2.5 Pro for SL specification tasks.
Supply formal contracts as input to LLMs.
Address LLM's SL domain-specific knowledge gaps.

Topics

Large Language Models
Static Program Analysis
Separation Logic
VeriFast
Program Verification
Gemini 2.5 Pro

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.