Are LLMs Ready for Anti-Pattern Detection in Microservice Architectures?

2026-02-24 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

A study investigated the readiness of Large Language Models (LLMs) for detecting architectural anti-patterns (APs) in microservice architectures (MSAs) using a prompt-based analysis pipeline over static repository artifacts. Researchers evaluated three general-purpose LLMs—ChatGPT 5.2, Gemini 3 Pro, and Qwen 3.5 Plus—on a benchmark of 13 microservice repositories annotated with 16 architectural anti-patterns. Their performance was compared against MARS, a static-analysis tool, using precision and recall metrics. Results indicate that LLMs offer useful support, achieving competitive performance on several anti-patterns, particularly those inferred from local, heterogeneous, or semantically rich evidence like "No API Versioning" (NAV) and "Hardcoded Endpoint" (HE). However, LLMs showed limitations on anti-patterns requiring explicit structural or cross-service dependency evidence, such as "Cyclic Dependency" (CD) and "Shared Persistence" (SP), where static analysis remains more reliable. The findings suggest LLMs are a promising complementary aid, not a replacement, for traditional analyzers in architectural assessment.

Key takeaway

For AI Architects evaluating microservice system quality, you should integrate LLM-based tools as a complementary layer to existing static analyzers. While your traditional tools like MARS will reliably detect structural anti-patterns such as cyclic dependencies, LLMs can effectively identify issues requiring semantic interpretation, like "No API Versioning" or "Hardcoded Endpoint." Consider a hybrid approach, combining LLMs with richer structural inputs like dependency graphs, to enhance detection coverage and reduce architectural decay.

Key insights

LLMs complement static analysis for microservice anti-pattern detection, excelling with semantic cues but struggling with structural dependencies.

Principles

LLMs excel with semantic, local evidence.
Static analysis is superior for structural proof.
Interpretive flexibility risks false positives.

Method

The study used a four-step pipeline: repository flattening (Repomix), LLM prompt execution (parametric, AP-specific), MARS execution, and data analysis (precision/recall against ground truth).

In practice

Use LLMs for qualitative design judgment.
Combine LLMs with structural analysis tools.
Prioritize static analysis for explicit dependencies.

Topics

Microservice Architectures
Anti-Pattern Detection
Large Language Models
Static Analysis
Architectural Assessment
Prompt Engineering
MARS Tool

Code references

Best for: AI Scientist, AI Architect, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.