This AI Startup’s Army Of 15,000 Hackers Pressure Test Claude, GPT-5 And Gemini - Forbes

· Source: Series A" OR "Series B" OR "Series C" AI startup via Google News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Intermediate, short

Summary

Gray Swan, an AI security startup founded in 2023 by Carnegie Mellon professors Matt Fredrikson and Zico Kolter, has raised \$40 million in Series A funding, bringing its valuation to \$200 million. The company specializes in pressure testing frontier AI models like Anthropic's Claude Mythos and OpenAI's GPT-5 for safety vulnerabilities. Gray Swan operates Arena, a platform where 15,000 security professionals "red team" AI systems, identifying and fixing exploits. This human-generated data trains Gray Swan's AI agent, Shade, which actively seeks vulnerabilities, and Cygnal, software that monitors AI model prompts and outputs to prevent harmful responses. Initially serving major AI labs such as OpenAI, Anthropic, and Google Deepmind, Gray Swan is now expanding to provide security tools to enterprises building their own AI products, with Snowflake already utilizing their software for its Cortex Code and Snowflake Intelligence agents.

Key takeaway

For AI Security Engineers evaluating new model deployments, you must prioritize robust red-teaming and continuous monitoring solutions. The expanding attack surface of AI agents interacting with external tools necessitates proactive, sophisticated testing beyond traditional methods. Implement AI-driven security agents, like Gray Swan's Shade and Cygnal, to identify subtle vulnerabilities and prevent malicious prompt injections or data exfiltration. Your security strategy should account for unpredictable attack vectors emerging from increasingly intelligent AI systems.

Key insights

Human red-teaming data is crucial for training AI agents to proactively identify and mitigate complex AI system vulnerabilities.

Principles

Method

Gray Swan trains its AI agent, Shade, using data from 15,000 human red-teamers on Arena. Shade continuously attacks systems, while Cygnal monitors prompts and outputs to block harmful generations and unauthorized tool access.

In practice

Topics

Best for: Investor, CTO, VP of Engineering/Data, AI Security Engineer, AI Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Series A" OR "Series B" OR "Series C" AI startup via Google News.