This AI Startup’s Army Of 15,000 Hackers Pressure Test Claude, GPT-5 And Gemini - Forbes
Summary
Gray Swan, an AI security startup founded in 2023 by Carnegie Mellon professors Matt Fredrikson and Zico Kolter, has raised \$40 million in Series A funding, bringing its valuation to \$200 million. The company specializes in pressure testing frontier AI models like Anthropic's Claude Mythos and OpenAI's GPT-5 for safety vulnerabilities. Gray Swan operates Arena, a platform where 15,000 security professionals "red team" AI systems, identifying and fixing exploits. This human-generated data trains Gray Swan's AI agent, Shade, which actively seeks vulnerabilities, and Cygnal, software that monitors AI model prompts and outputs to prevent harmful responses. Initially serving major AI labs such as OpenAI, Anthropic, and Google Deepmind, Gray Swan is now expanding to provide security tools to enterprises building their own AI products, with Snowflake already utilizing their software for its Cortex Code and Snowflake Intelligence agents.
Key takeaway
For AI Security Engineers evaluating new model deployments, you must prioritize robust red-teaming and continuous monitoring solutions. The expanding attack surface of AI agents interacting with external tools necessitates proactive, sophisticated testing beyond traditional methods. Implement AI-driven security agents, like Gray Swan's Shade and Cygnal, to identify subtle vulnerabilities and prevent malicious prompt injections or data exfiltration. Your security strategy should account for unpredictable attack vectors emerging from increasingly intelligent AI systems.
Key insights
Human red-teaming data is crucial for training AI agents to proactively identify and mitigate complex AI system vulnerabilities.
Principles
- AI systems create new, unpredictable attack surfaces.
- Human-driven red teaming enhances AI security tools.
- Jailbreaking AI models requires increasing complexity.
Method
Gray Swan trains its AI agent, Shade, using data from 15,000 human red-teamers on Arena. Shade continuously attacks systems, while Cygnal monitors prompts and outputs to block harmful generations and unauthorized tool access.
In practice
- Use red-teaming to find AI model vulnerabilities.
- Implement AI agents for continuous security testing.
- Monitor AI prompts/outputs for malicious activity.
Topics
- AI Security
- Red Teaming
- Large Language Models
- AI Agents
- Vulnerability Testing
- Prompt Injection
Best for: Investor, CTO, VP of Engineering/Data, AI Security Engineer, AI Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Series A" OR "Series B" OR "Series C" AI startup via Google News.