An Empirical Study of LLM-Generated Specifications for VeriFast
Summary
An empirical study thoroughly evaluated the performance of large language models (LLMs) in generating specifications for VeriFast, a separation logic (SL) verifier. This research addressed the significant human labor typically required for SL verifiers, especially for complex auxiliary specifications in heap-manipulating programs. The study involved verifying 303 C functions, exploring eight prompting approaches, ten LLMs, and three input types across two stages. Quantitative and qualitative analyses revealed that LLMs preserved functional behavior in both source code and specifications over 91%, but achieved a modest 31.4% verification success rate. Notably, using Gemini 2.5 Pro and providing formal contracts improved success. The majority of errors, 94%, stemmed from LLMs' lack of domain-specific knowledge concerning SL verifiers like VeriFast, offering crucial insights for optimizing future LLM-generated specifications.
Key takeaway
For Research Scientists developing tools for static verification, particularly with separation logic verifiers, this study indicates that while LLMs can preserve functional behavior in generated specifications, their current verification success rate is modest (31.4%). You should prioritize strategies to enhance LLMs' domain-specific knowledge of SL verifiers. Consider using models like Gemini 2.5 Pro and providing formal contracts as input to improve accuracy and reduce the 94% error rate attributed to domain knowledge gaps.
Key insights
LLMs can generate specifications for separation logic verifiers, but their domain-specific knowledge limitations lead to modest verification success.
Principles
- LLMs preserve functional behavior in code and specifications (>91%).
- Most LLM errors (94%) stem from SL domain-specific knowledge.
- Formal contracts significantly improve LLM specification generation.
Method
The study evaluated LLM-generated specifications for 303 C functions using 8 prompting approaches, 10 LLMs, and 3 input types for an SL verifier like VeriFast.
In practice
- Consider Gemini 2.5 Pro for SL specification tasks.
- Supply formal contracts as input to LLMs.
- Address LLM's SL domain-specific knowledge gaps.
Topics
- Large Language Models
- Static Program Analysis
- Separation Logic
- VeriFast
- Program Verification
- Gemini 2.5 Pro
Best for: AI Scientist, Research Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.