AI Code Review Only Catches Half of Your Bugs

· Source: AI & ML – Radar · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cybersecurity & Data Privacy · Depth: Intermediate, long

Summary

AI code review tools are limited to structural analysis, catching only about half of all bugs, particularly missing design flaws and intent violations. Andrew Stellman's experience with a faulty AI-generated bus app highlighted this "intent ceiling," where code is "correct" but does the wrong thing. This limitation is critical for security, as approximately 50% of security vulnerabilities are design flaws, not implementation bugs, making them invisible to static analysis. Stellman developed the open-source Quality Playbook, a skill for AI tools like Claude Code, Cursor, and Copilot, which uses AI to derive and verify requirements. This playbook found a long-standing bug in Google's Gson library by analyzing community-derived null-handling requirements, demonstrating the power of intent-driven verification over purely structural checks. The article emphasizes that effective AI code verification requires understanding the software's intended purpose and negative requirements.

Key takeaway

For AI Engineers and MLOps teams building or verifying code, relying solely on structural AI code review is insufficient for catching critical design flaws and security vulnerabilities. You should integrate explicit requirements engineering, focusing on the "why" behind features and defining negative requirements (what the software must *not* do). Consider using tools like the Quality Playbook or adopting its four-step method to derive and verify intent, ensuring your AI-driven development addresses the full spectrum of potential bugs.

Key insights

AI code review needs explicit requirements and intent to catch design flaws and security vulnerabilities.

Principles

Method

The Quality Playbook splits requirement derivation into four steps: observe behavioral contracts, derive requirements from contracts and docs, check coverage, and assert completeness, using an external contracts file for memory.

In practice

Topics

Code references

Best for: AI Engineer, Software Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI & ML – Radar.