VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection
Summary
VulnAgent-X is a layered agentic framework designed for repository-level software vulnerability detection, addressing limitations of existing methods like local code views and one-shot prediction. It integrates lightweight risk screening, bounded context expansion, specialized analysis agents, selective dynamic verification, and evidence fusion into a unified pipeline. Experiments on function-level (Devign, Big-Vul, PrimeVul) and just-in-time (JULY) vulnerability benchmarks demonstrate VulnAgent-X's superior performance over static baselines, encoder-based models (CodeBERT, GraphCodeBERT), and simpler agentic variants. The framework achieves better localization and balanced performance-cost trade-offs, reducing average cost by 39.6% fewer tokens and 35.1% less time compared to "always-full" execution, with 62.4% of samples resolved via early exit. Key settings include a suspicious-region budget K=8 and context budget B=6,000 tokens.
Key takeaway
For AI Security Engineers evaluating repository-level vulnerability detection tools, VulnAgent-X offers a robust, evidence-calibrated approach that significantly reduces false positives and improves localization. You should consider adopting multi-agent frameworks that integrate staged analysis and selective dynamic verification. This method enhances detection quality and provides interpretable results, crucial for managing security risks in complex software projects.
Key insights
Vulnerability detection is improved by a staged, evidence-driven auditing process using multi-agent collaboration and selective verification.
Principles
- Decompose vulnerability detection into staged inference.
- Fuse diverse evidence for robust security decisions.
- Prioritize selective dynamic verification for high-risk findings.
Method
VulnAgent-X performs fast risk screening, expands context, dispatches specialized agents (Router, Semantic, Security, Logic Bug, Sceptic), selectively verifies high-risk findings, and fuses evidence for a final decision.
In practice
- Implement a Sceptic Agent to reduce false positives by seeking counter-evidence.
- Use threshold-based escalation to balance detection reliability and runtime cost.
- Limit context expansion to B=6,000 tokens for efficiency.
Topics
- Vulnerability Detection
- Multi-Agent Systems
- Repository Analysis
- Code Security Auditing
- Large Language Models
- Dynamic Verification
Code references
Best for: Research Scientist, AI Scientist, AI Security Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.