FORGE: Multi-Agent Graduated Exploitation and Detection Engineering

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

FORGE is a multi-agent system designed to integrate vulnerability assessment, prioritization, and detection rule engineering, addressing the challenge of high vulnerability disclosure volumes. It employs five specialized agents—Intel, Generator, Planner, Exploit, and Detector—in a fixed pipeline. This system generates targeted vulnerable applications from CVE metadata, performs coached multi-turn exploitation assessed by an LLM-primary oracle using a four-level taxonomy (L0-L3), and creates Sigma and Snort detection rules based on OpenTelemetry exploitation traces. Graduated exploitation depth serves as the core bridging mechanism, providing detailed behavioral traces for detection and ground truth for prioritization validation. A tiered knowledge architecture facilitates experience transfer across assessments. Evaluation on 603 CVEs from the CVE-GENIE dataset demonstrated 67.8% end-to-end L1+ exploitation at USD 1.50 per CVE, spanning eight languages and 187 CWE types. Exploitation rates remained consistent near 68% across EPSS or CVSS bands. Detection rules from L2+ exploitation showed significantly higher span-normalized grounding than L1-derived rules (p=0.035), with 93.4% of generated Snort rules producing zero false positives against a synthetic benign corpus.

Key takeaway

For AI Security Engineers or Machine Learning Engineers tasked with scaling vulnerability assessment and detection, FORGE demonstrates a viable path to integrate these traditionally siloed functions. You should consider adopting multi-agent systems and graduated exploitation depth to generate high-fidelity detection rules and improve prioritization ground truth. This approach can significantly enhance your organization's capacity to respond to the increasing volume of vulnerability disclosures.

Key insights

FORGE unifies vulnerability exploitation, prioritization, and detection engineering via graduated depth and multi-agent automation.

Principles

Method

FORGE uses a five-agent pipeline: Intel, Generator, Planner, Exploit, Detector. It generates vulnerable apps, conducts LLM-assessed multi-turn exploitation (L0-L3), and produces detection rules from OpenTelemetry traces.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Security Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.