AI Code Review Only Catches Half of Your Bugs
Summary
AI code review tools are limited to structural analysis, catching only about half of all bugs, particularly missing design flaws and intent violations. Andrew Stellman's experience with a faulty AI-generated bus app highlighted this "intent ceiling," where code is "correct" but does the wrong thing. This limitation is critical for security, as approximately 50% of security vulnerabilities are design flaws, not implementation bugs, making them invisible to static analysis. Stellman developed the open-source Quality Playbook, a skill for AI tools like Claude Code, Cursor, and Copilot, which uses AI to derive and verify requirements. This playbook found a long-standing bug in Google's Gson library by analyzing community-derived null-handling requirements, demonstrating the power of intent-driven verification over purely structural checks. The article emphasizes that effective AI code verification requires understanding the software's intended purpose and negative requirements.
Key takeaway
For AI Engineers and MLOps teams building or verifying code, relying solely on structural AI code review is insufficient for catching critical design flaws and security vulnerabilities. You should integrate explicit requirements engineering, focusing on the "why" behind features and defining negative requirements (what the software must *not* do). Consider using tools like the Quality Playbook or adopting its four-step method to derive and verify intent, ensuring your AI-driven development addresses the full spectrum of potential bugs.
Key insights
AI code review needs explicit requirements and intent to catch design flaws and security vulnerabilities.
Principles
- Structural analysis alone misses ~50% of bugs.
- Requirements must include purpose and "why."
- Negative requirements define critical boundaries.
Method
The Quality Playbook splits requirement derivation into four steps: observe behavioral contracts, derive requirements from contracts and docs, check coverage, and assert completeness, using an external contracts file for memory.
In practice
- Write down what software guarantees, not just what it does.
- Feed AI design intent from chat logs and discussions.
- Articulate what software must *not* do (negative requirements).
Topics
- AI Code Review
- Requirements Engineering
- Intent Ceiling
- Quality Playbook
- Software Security
Code references
Best for: AI Engineer, Software Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI & ML – Radar.