ANVIL: Anomaly-based Vulnerability Identification without Labelled Training Data

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Software Development & Engineering · Depth: Advanced, extended

Summary

ANVIL is a novel, anomaly-based software vulnerability detector that utilizes pre-trained Large Language Models (LLMs) without requiring labeled training data. It reframes vulnerability detection as an anomaly detection problem, observing that LLMs trained for code generation exhibit a significant accuracy gap when reconstructing vulnerable versus non-vulnerable code. Implemented at line-level granularity, ANVIL was evaluated on the Magma benchmark and a leakage-free 2024 CVEFixes dataset. It significantly outperforms leading supervised detectors, LineVul and LineVD, achieving \$1.62\times$ to \$2.18\times$ better Top-5 accuracies and \$1.02\times$ to \$1.29\times$ better ROC scores, despite not using labeled vulnerability data for training. Experiments showed that larger LLMs, such as CodeLlama-13B, and adaptive Maximum Compound Statement (MCS) contexts enhance detection. ANVIL's capabilities generalize to unseen vulnerabilities, demonstrating robust performance across diverse datasets and LLM architectures like CodeLlama, CodeQwen, and StarCoderBase.

Key takeaway

For AI Security Engineers developing automated vulnerability detection, ANVIL demonstrates a powerful, label-free paradigm shift. You should explore integrating anomaly-based LLM techniques, employing pre-trained models' code generation capabilities to identify deviations. This approach significantly reduces reliance on scarce, expensive labeled vulnerability datasets, offering superior performance in both classification and prioritization of unseen vulnerabilities compared to supervised methods.

Key insights

LLMs' inability to reconstruct vulnerable code accurately reveals anomalies without labeled training.

Principles

Method

ANVIL masks code lines, uses an LLM for reconstruction, then calculates a hybrid anomaly score from reconstruction loss and exact match to identify vulnerabilities.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, AI Security Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.