VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Expert, long

Summary

VulnAgent-X is a layered agentic framework designed for repository-level software vulnerability detection, addressing limitations of existing methods like local code views and one-shot prediction. It integrates lightweight risk screening, bounded context expansion, specialized analysis agents, selective dynamic verification, and evidence fusion into a unified pipeline. Experiments on function-level (Devign, Big-Vul, PrimeVul) and just-in-time (JULY) vulnerability benchmarks demonstrate VulnAgent-X's superior performance over static baselines, encoder-based models (CodeBERT, GraphCodeBERT), and simpler agentic variants. The framework achieves better localization and balanced performance-cost trade-offs, reducing average cost by 39.6% fewer tokens and 35.1% less time compared to "always-full" execution, with 62.4% of samples resolved via early exit. Key settings include a suspicious-region budget K=8 and context budget B=6,000 tokens.

Key takeaway

For AI Security Engineers evaluating repository-level vulnerability detection tools, VulnAgent-X offers a robust, evidence-calibrated approach that significantly reduces false positives and improves localization. You should consider adopting multi-agent frameworks that integrate staged analysis and selective dynamic verification. This method enhances detection quality and provides interpretable results, crucial for managing security risks in complex software projects.

Key insights

Vulnerability detection is improved by a staged, evidence-driven auditing process using multi-agent collaboration and selective verification.

Principles

Method

VulnAgent-X performs fast risk screening, expands context, dispatches specialized agents (Router, Semantic, Security, Logic Bug, Sceptic), selectively verifies high-risk findings, and fuses evidence for a final decision.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, AI Security Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.