VeruSAGE: A Study of Agent-Based Verification for Rust Systems
Summary
A study titled "VeruSAGE: A Study of Agent-Based Verification for Rust Systems" investigates the capability of large language models (LLMs) to generate correctness proofs for Rust system software. Researchers curated VeruSAGE-Bench, a new benchmark suite comprising 849 proof tasks derived from eight open-source Verus-verified Rust systems. The study designed various agent systems tailored to different LLMs, including o4-mini, GPT-5, Sonnet 4, and Sonnet 4.5, to optimize their system-verification performance. Findings indicate that specific tool and agent configurations are crucial for stimulating LLM capabilities in this domain. The most effective LLM-agent combination achieved over 80% completion on VeruSAGE-Bench tasks and successfully completed over 90% of system proof tasks that human experts had not yet finished, demonstrating significant potential for LLM-assisted verified system software development.
Key takeaway
For AI Scientists and Research Scientists focused on formal verification, this study suggests that integrating LLMs with specialized agent systems can significantly accelerate the development of verified system software. Your teams should explore agent-based LLM approaches for generating correctness proofs, particularly for Rust systems, to potentially surpass human expert completion rates on complex verification tasks.
Key insights
LLMs, when paired with specialized agent systems, can effectively generate correctness proofs for Rust system software.
Principles
- LLM performance varies by agent system.
- Tailored tools stimulate LLM capabilities.
Method
The study involved curating a benchmark (VeruSAGE-Bench) from existing Verus-verified Rust systems and designing agent systems to evaluate LLM performance (o4-mini, GPT-5, Sonnet 4, Sonnet 4.5) on proof tasks.
In practice
- Use VeruSAGE-Bench for LLM verification tasks.
- Experiment with agent systems for LLM proof generation.
Topics
- Agent-Based Verification
- Rust Systems
- Large Language Models
- Code Correctness Proofs
- VeruSAGE-Bench
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.