New benchmark shows AI agents can exploit most smart contract vulnerabilities on their own

2026-02-19 · Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Blockchain & Distributed Ledger Technology · Depth: Advanced, quick

Summary

OpenAI and crypto investment firm Paradigm introduced EVMbench, a new benchmark designed to evaluate AI agents' ability to identify, repair, and exploit security vulnerabilities in Ethereum smart contracts. The dataset comprises 120 vulnerabilities derived from 40 actual security audits. In a realistic test environment where AI agents interacted with a local blockchain autonomously, the GPT-5.3-Codex model exploited 72 percent of vulnerabilities and fixed 41.5 percent. Claude Opus 4.6 led in detection, identifying 45.6 percent. Researchers noted that locating vulnerabilities in extensive codebases is the primary challenge; providing hints about vulnerability locations significantly boosted exploit success rates from 63 to 96 percent and fix rates from 39 to 94 percent. With over $100 billion in smart contracts, this technology presents both security enhancement opportunities and potential risks.

Key takeaway

For security teams managing Ethereum smart contracts, the EVMbench findings highlight AI's dual potential: significantly enhancing vulnerability detection and repair, but also posing exploitation risks. You should explore integrating AI agents, like GPT-5.3-Codex, into your security auditing processes, particularly for pinpointing known vulnerability types. Prioritize developing robust defense mechanisms against AI-driven exploits, given their high success rates when vulnerability locations are known.

Key insights

AI agents demonstrate significant capability in exploiting and fixing smart contract vulnerabilities, especially with location hints.

Principles

Vulnerability localization is key for AI agent effectiveness.
AI agents can autonomously interact with blockchain environments.

Method

EVMbench evaluates AI agents by having them find, fix, and exploit smart contract vulnerabilities on a local blockchain, using a dataset of 120 real-world audit findings.

In practice

Use AI for smart contract vulnerability detection.
Integrate AI agents into security audit workflows.

Topics

EVMbench
Smart Contract Security
AI Agents
Vulnerability Exploitation
GPT-5.3-Codex

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, AI Security Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.