Snyk VulnBench JS 1.0: Can LLMs Find the Same Bugs Twice?
Summary
Snyk VulnBench JS 1.0 investigated the repeatability of agentic large language model (LLM) security reviews on JavaScript code, conducting 300 repeated vulnerability-finding scans. The study found LLM security findings were unevenly repeatable: reference-matched findings were stable, with 134 of 158 unique findings appearing in all five repetitions. However, extra model reports varied heavily, as 80 of 161 unique unmatched findings appeared in only one of five identical repetitions, and only 22 appeared in all five. The benchmark also highlighted complementarity, with models consistently identifying familiar, high-signal exploit shapes and even surfacing a potential Snyk Code product gap. Snyk Code static application security testing (SAST) proved deterministic and superior at systematically enumerating repeated data-flow sinks. The results advocate for combining agentic LLM review with deterministic SAST.
Key takeaway
For AI Security Engineers evaluating code analysis tools, you should integrate agentic LLM review with deterministic SAST. This approach leverages LLMs' ability to identify novel exploit shapes while relying on SAST for systematic enumeration of data-flow sinks, mitigating the variability of LLM-generated reports. Your security pipeline will benefit from this combined strategy, enhancing overall vulnerability detection.
Key insights
LLM security review repeatability varies significantly, with reference-matched findings stable but additional reports highly inconsistent.
Principles
- LLM security findings are unevenly repeatable.
- Combine agentic LLM review with deterministic SAST.
Method
The study involved 300 repeated vulnerability-finding scans on JavaScript code using the same prompt and benchmark harness to measure LLM security review consistency.
In practice
- LLMs can find high-signal exploit shapes.
- SAST excels at enumerating data-flow sinks.
Topics
- Snyk VulnBench JS
- LLM Security Review
- Static Application Security Testing
- Vulnerability Finding
- JavaScript Security
- Code Analysis Repeatability
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Security Engineer, AI Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.