An Empirical Study of LLM-Generated Specifications for VeriFast

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

An empirical study thoroughly evaluated the performance of large language models (LLMs) in generating specifications for VeriFast, a separation logic (SL) verifier. This research addressed the significant human labor typically required for SL verifiers, especially for complex auxiliary specifications in heap-manipulating programs. The study involved verifying 303 C functions, exploring eight prompting approaches, ten LLMs, and three input types across two stages. Quantitative and qualitative analyses revealed that LLMs preserved functional behavior in both source code and specifications over 91%, but achieved a modest 31.4% verification success rate. Notably, using Gemini 2.5 Pro and providing formal contracts improved success. The majority of errors, 94%, stemmed from LLMs' lack of domain-specific knowledge concerning SL verifiers like VeriFast, offering crucial insights for optimizing future LLM-generated specifications.

Key takeaway

For Research Scientists developing tools for static verification, particularly with separation logic verifiers, this study indicates that while LLMs can preserve functional behavior in generated specifications, their current verification success rate is modest (31.4%). You should prioritize strategies to enhance LLMs' domain-specific knowledge of SL verifiers. Consider using models like Gemini 2.5 Pro and providing formal contracts as input to improve accuracy and reduce the 94% error rate attributed to domain knowledge gaps.

Key insights

LLMs can generate specifications for separation logic verifiers, but their domain-specific knowledge limitations lead to modest verification success.

Principles

Method

The study evaluated LLM-generated specifications for 303 C functions using 8 prompting approaches, 10 LLMs, and 3 input types for an SL verifier like VeriFast.

In practice

Topics

Best for: AI Scientist, Research Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.