New benchmark shows AI agents can exploit most smart contract vulnerabilities on their own
Summary
OpenAI and crypto investment firm Paradigm introduced EVMbench, a new benchmark designed to evaluate AI agents' ability to identify, repair, and exploit security vulnerabilities in Ethereum smart contracts. The dataset comprises 120 vulnerabilities derived from 40 actual security audits. In a realistic test environment where AI agents interacted with a local blockchain autonomously, the GPT-5.3-Codex model exploited 72 percent of vulnerabilities and fixed 41.5 percent. Claude Opus 4.6 led in detection, identifying 45.6 percent. Researchers noted that locating vulnerabilities in extensive codebases is the primary challenge; providing hints about vulnerability locations significantly boosted exploit success rates from 63 to 96 percent and fix rates from 39 to 94 percent. With over $100 billion in smart contracts, this technology presents both security enhancement opportunities and potential risks.
Key takeaway
For security teams managing Ethereum smart contracts, the EVMbench findings highlight AI's dual potential: significantly enhancing vulnerability detection and repair, but also posing exploitation risks. You should explore integrating AI agents, like GPT-5.3-Codex, into your security auditing processes, particularly for pinpointing known vulnerability types. Prioritize developing robust defense mechanisms against AI-driven exploits, given their high success rates when vulnerability locations are known.
Key insights
AI agents demonstrate significant capability in exploiting and fixing smart contract vulnerabilities, especially with location hints.
Principles
- Vulnerability localization is key for AI agent effectiveness.
- AI agents can autonomously interact with blockchain environments.
Method
EVMbench evaluates AI agents by having them find, fix, and exploit smart contract vulnerabilities on a local blockchain, using a dataset of 120 real-world audit findings.
In practice
- Use AI for smart contract vulnerability detection.
- Integrate AI agents into security audit workflows.
Topics
- EVMbench
- Smart Contract Security
- AI Agents
- Vulnerability Exploitation
- GPT-5.3-Codex
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, AI Security Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.