The test suite as a regression sensor

2026-05-27 · Source: Martin Fowler · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

This article details experiments with "sensors" designed to enhance the maintainability of codebases generated by coding agents, specifically for an internal TypeScript, NextJS, and React analytics dashboard. The author used a mix of Claude Sonnet, Claude Opus, and Cursor's composer-2 models. Experiments covered static code analysis, including ESLint with custom rules for AI shortcomings (e.g., max arguments, file/function length, cyclomatic complexity), and dependency-cruiser for enforcing module layering. Custom coupling metrics were explored, but raw data proved less useful than AI-driven modularity reviews using Vlad Khononov's "Modularity Skills," which identified issues like duplicate route code and inconsistent backend calls. Mutation testing with Stryker was found crucial for detecting assertion gaps in AI-generated test suites, even with 100% statement coverage, highlighting its role as a regression sensor. The article concludes that computational sensors excel at file-level issues, while inferential (LLM-based) sensors are vital for semantic interpretation of cross-file concerns like modularity.

Key takeaway

For AI Engineers building with coding agents, you must integrate a robust sensor system to manage codebase maintainability and technical debt. Relying solely on AI-generated tests or basic linting creates a false sense of security; instead, implement custom static analysis rules and crucial mutation testing to verify test effectiveness. Your review process should also incorporate LLM-based modularity reviews, as they provide semantic interpretation beyond raw metrics, helping you identify and address complex design issues early.

Key insights

AI-generated code maintainability benefits from a system of computational and inferential sensors.

Principles

Custom sensor feedback guides AI self-correction.
LLMs provide semantic interpretation for complex code issues.
Mutation testing reveals true test suite effectiveness.

Method

Implement a sensor system combining computational tools (ESLint, dependency-cruiser, type checkers, mutation testing) with inferential LLM-based reviews for maintainability feedback to coding agents.

In practice

Configure custom ESLint formatters for AI guidance.
Use dependency-cruiser to enforce module layering.
Employ mutation testing to validate AI-generated tests.

Topics

Coding Agents
Code Maintainability
Static Code Analysis
Dependency-cruiser
Mutation Testing
LLM-based Code Review

Code references

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Martin Fowler.