AI Code Review Only Catches Half of Your Bugs

2026-04-30 · Source: AI & ML – Radar · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cybersecurity & Data Privacy · Depth: Intermediate, long

Summary

AI code review tools are limited to structural analysis, catching only about half of all bugs, particularly missing design flaws and intent violations. Andrew Stellman's experience with a faulty AI-generated bus app highlighted this "intent ceiling," where code is "correct" but does the wrong thing. This limitation is critical for security, as approximately 50% of security vulnerabilities are design flaws, not implementation bugs, making them invisible to static analysis. Stellman developed the open-source Quality Playbook, a skill for AI tools like Claude Code, Cursor, and Copilot, which uses AI to derive and verify requirements. This playbook found a long-standing bug in Google's Gson library by analyzing community-derived null-handling requirements, demonstrating the power of intent-driven verification over purely structural checks. The article emphasizes that effective AI code verification requires understanding the software's intended purpose and negative requirements.

Key takeaway

For AI Engineers and MLOps teams building or verifying code, relying solely on structural AI code review is insufficient for catching critical design flaws and security vulnerabilities. You should integrate explicit requirements engineering, focusing on the "why" behind features and defining negative requirements (what the software must *not* do). Consider using tools like the Quality Playbook or adopting its four-step method to derive and verify intent, ensuring your AI-driven development addresses the full spectrum of potential bugs.

Key insights

AI code review needs explicit requirements and intent to catch design flaws and security vulnerabilities.

Principles

Structural analysis alone misses ~50% of bugs.
Requirements must include purpose and "why."
Negative requirements define critical boundaries.

Method

The Quality Playbook splits requirement derivation into four steps: observe behavioral contracts, derive requirements from contracts and docs, check coverage, and assert completeness, using an external contracts file for memory.

In practice

Write down what software guarantees, not just what it does.
Feed AI design intent from chat logs and discussions.
Articulate what software must *not* do (negative requirements).

Topics

AI Code Review
Requirements Engineering
Intent Ceiling
Quality Playbook
Software Security

Code references

Best for: AI Engineer, Software Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI & ML – Radar.