FORGE: Multi-Agent Graduated Exploitation and Detection Engineering

2026-06-02 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

FORGE is a multi-agent system designed to integrate vulnerability assessment, prioritization, and detection rule engineering, addressing the challenge of high vulnerability disclosure volumes. It employs five specialized agents—Intel, Generator, Planner, Exploit, and Detector—in a fixed pipeline. This system generates targeted vulnerable applications from CVE metadata, performs coached multi-turn exploitation assessed by an LLM-primary oracle using a four-level taxonomy (L0-L3), and creates Sigma and Snort detection rules based on OpenTelemetry exploitation traces. Graduated exploitation depth serves as the core bridging mechanism, providing detailed behavioral traces for detection and ground truth for prioritization validation. A tiered knowledge architecture facilitates experience transfer across assessments. Evaluation on 603 CVEs from the CVE-GENIE dataset demonstrated 67.8% end-to-end L1+ exploitation at USD 1.50 per CVE, spanning eight languages and 187 CWE types. Exploitation rates remained consistent near 68% across EPSS or CVSS bands. Detection rules from L2+ exploitation showed significantly higher span-normalized grounding than L1-derived rules (p=0.035), with 93.4% of generated Snort rules producing zero false positives against a synthetic benign corpus.

Key takeaway

For AI Security Engineers or Machine Learning Engineers tasked with scaling vulnerability assessment and detection, FORGE demonstrates a viable path to integrate these traditionally siloed functions. You should consider adopting multi-agent systems and graduated exploitation depth to generate high-fidelity detection rules and improve prioritization ground truth. This approach can significantly enhance your organization's capacity to respond to the increasing volume of vulnerability disclosures.

Key insights

FORGE unifies vulnerability exploitation, prioritization, and detection engineering via graduated depth and multi-agent automation.

Principles

Graduated exploitation depth enhances detection rule quality.
Multi-agent systems can bridge isolated security research silos.
Exploitation reachability is orthogonal to metadata-based prioritization.

Method

FORGE uses a five-agent pipeline: Intel, Generator, Planner, Exploit, Detector. It generates vulnerable apps, conducts LLM-assessed multi-turn exploitation (L0-L3), and produces detection rules from OpenTelemetry traces.

In practice

Generate detection rules from L2+ exploitation traces.
Use LLM-primary oracles for multi-level vulnerability assessment.
Automate CVE-to-exploit-to-detection workflows.

Topics

Multi-Agent Systems
Vulnerability Exploitation
Detection Engineering
CVE Prioritization
OpenTelemetry
LLM Oracle

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Security Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.