VeruSAGE: A Study of Agent-Based Verification for Rust Systems

2025-12-20 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Mathematics & Computational Sciences · Depth: Expert, quick

Summary

A study titled "VeruSAGE: A Study of Agent-Based Verification for Rust Systems" investigates the capability of large language models (LLMs) to generate correctness proofs for Rust system software. Researchers curated VeruSAGE-Bench, a new benchmark suite comprising 849 proof tasks derived from eight open-source Verus-verified Rust systems. The study designed various agent systems tailored to different LLMs, including o4-mini, GPT-5, Sonnet 4, and Sonnet 4.5, to optimize their system-verification performance. Findings indicate that specific tool and agent configurations are crucial for stimulating LLM capabilities in this domain. The most effective LLM-agent combination achieved over 80% completion on VeruSAGE-Bench tasks and successfully completed over 90% of system proof tasks that human experts had not yet finished, demonstrating significant potential for LLM-assisted verified system software development.

Key takeaway

For AI Scientists and Research Scientists focused on formal verification, this study suggests that integrating LLMs with specialized agent systems can significantly accelerate the development of verified system software. Your teams should explore agent-based LLM approaches for generating correctness proofs, particularly for Rust systems, to potentially surpass human expert completion rates on complex verification tasks.

Key insights

LLMs, when paired with specialized agent systems, can effectively generate correctness proofs for Rust system software.

Principles

LLM performance varies by agent system.
Tailored tools stimulate LLM capabilities.

Method

The study involved curating a benchmark (VeruSAGE-Bench) from existing Verus-verified Rust systems and designing agent systems to evaluate LLM performance (o4-mini, GPT-5, Sonnet 4, Sonnet 4.5) on proof tasks.

In practice

Use VeruSAGE-Bench for LLM verification tasks.
Experiment with agent systems for LLM proof generation.

Topics

Agent-Based Verification
Rust Systems
Large Language Models
Code Correctness Proofs
VeruSAGE-Bench

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.