Code-Augur: Agentic Vulnerability Detection via Specification Inference

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, extended

Summary

Code-Augur is a novel agentic vulnerability detection system that employs a security-specification-first paradigm to enhance software security audits. This system explicitly exposes an LLM agent's tacit assumptions as in-source security specifications and continuously refines them through runtime falsification using a guided fuzzer. Implemented in 9.1K lines of TypeScript, Code-Augur integrates with off-the-shelf grey-box fuzzing tools for various languages. It demonstrated superior performance, detecting 34%-370% more bugs than state-of-the-art agentic systems like Claude Code and Atlantis on AIxCC and OSV benchmarks. Notably, Code-Augur discovered 22 new vulnerabilities in widely-used open-source projects, with 16 already fixed or confirmed by developers. The system also effectively leverages open-weight LLMs such as DeepSeek V4 Pro, achieving comparable or better results than other agents using more expensive frontier models like Claude Sonnet 4.6. Its inferred invariants serve as durable artifacts, aiding in long-term regression checking.

Key takeaway

For AI Security Engineers developing autonomous vulnerability detection systems, you should adopt a specification-first approach. Explicitly committing LLM-inferred security invariants into source code and continuously validating them with guided fuzzing significantly improves bug discovery. This method not only uncovers more vulnerabilities, including previously unknown ones, but also creates durable security specifications for long-term project health and regression prevention. Consider integrating this reason-falsify-refine loop into your audit workflows.

Key insights

Agentic vulnerability detection improves by making LLM assumptions explicit and continuously validating them with fuzzing.

Principles

Method

Code-Augur infers security invariants from code, commits them as in-source assertions, and uses a guided fuzzer to falsify these assertions, refining them iteratively.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.