The test suite as a regression sensor

· Source: Martin Fowler · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

This article details experiments with "sensors" designed to enhance the maintainability of codebases generated by coding agents, specifically for an internal TypeScript, NextJS, and React analytics dashboard. The author used a mix of Claude Sonnet, Claude Opus, and Cursor's composer-2 models. Experiments covered static code analysis, including ESLint with custom rules for AI shortcomings (e.g., max arguments, file/function length, cyclomatic complexity), and dependency-cruiser for enforcing module layering. Custom coupling metrics were explored, but raw data proved less useful than AI-driven modularity reviews using Vlad Khononov's "Modularity Skills," which identified issues like duplicate route code and inconsistent backend calls. Mutation testing with Stryker was found crucial for detecting assertion gaps in AI-generated test suites, even with 100% statement coverage, highlighting its role as a regression sensor. The article concludes that computational sensors excel at file-level issues, while inferential (LLM-based) sensors are vital for semantic interpretation of cross-file concerns like modularity.

Key takeaway

For AI Engineers building with coding agents, you must integrate a robust sensor system to manage codebase maintainability and technical debt. Relying solely on AI-generated tests or basic linting creates a false sense of security; instead, implement custom static analysis rules and crucial mutation testing to verify test effectiveness. Your review process should also incorporate LLM-based modularity reviews, as they provide semantic interpretation beyond raw metrics, helping you identify and address complex design issues early.

Key insights

AI-generated code maintainability benefits from a system of computational and inferential sensors.

Principles

Method

Implement a sensor system combining computational tools (ESLint, dependency-cruiser, type checkers, mutation testing) with inferential LLM-based reviews for maintainability feedback to coding agents.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Martin Fowler.