How Amazon uses agentic AI for vulnerability detection at global scale

2026-04-08 · Source: Amazon Science homepage · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, medium

Summary

Amazon's RuleForge system utilizes agentic AI to significantly accelerate the creation of production-ready vulnerability detection rules, achieving a 336% increase in speed compared to traditional manual methods. This system addresses the growing volume of new common vulnerabilities and exposures (CVEs), with over 48,000 published in 2025, by automating the translation of vulnerability disclosures into robust detection logic. RuleForge employs a multi-agent architecture that mirrors human expert workflows, featuring specialized AI agents for ingestion, parallel rule generation, AI-powered evaluation, and multistage validation. A critical component is a separate "judge" model, which, through domain-specific prompts and negative phrasing, reduces false positives by 67% while preserving true positives, ensuring high precision for production security systems. The human-in-the-loop design maintains final oversight, closing the gap between vulnerability disclosure and defense.

Key takeaway

For AI Architects and Security Teams tasked with scaling vulnerability defense, RuleForge demonstrates that agentic AI can augment human expertise at production scale. You should consider adopting a multi-agent architecture with distinct generation and evaluation models to accelerate rule creation and reduce false positives. This approach allows your team to shift focus from manual authoring to critical review, multiplying throughput and enhancing protection against high-severity CVEs.

Key insights

Agentic AI systems can dramatically accelerate vulnerability detection rule generation while maintaining high precision through specialized agents.

Principles

Decompose complex tasks into specialized AI agent stages.
Separate generation from evaluation for improved accuracy.
Incorporate human-in-the-loop for final validation.

Method

RuleForge ingests exploit code, generates multiple candidate rules in parallel using AWS Fargate and Amazon Bedrock, evaluates them with a dedicated judge model, and validates through synthetic and traffic log tests before human review.

In practice

Use negative phrasing in prompts for better LLM calibration.
Employ domain-specific prompts for evaluation agents.
Implement multi-agent systems for complex security tasks.

Topics

RuleForge
Agentic AI
Vulnerability Detection
CVEs
Security Automation

Best for: AI Architect, AI Product Manager, CTO, AI Security Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Amazon Science homepage.