Are LLMs Ready for Anti-Pattern Detection in Microservice Architectures?

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

A study investigated the readiness of Large Language Models (LLMs) for detecting architectural anti-patterns (APs) in microservice architectures (MSAs) using a prompt-based analysis pipeline over static repository artifacts. Researchers evaluated three general-purpose LLMs—ChatGPT 5.2, Gemini 3 Pro, and Qwen 3.5 Plus—on a benchmark of 13 microservice repositories annotated with 16 architectural anti-patterns. Their performance was compared against MARS, a static-analysis tool, using precision and recall metrics. Results indicate that LLMs offer useful support, achieving competitive performance on several anti-patterns, particularly those inferred from local, heterogeneous, or semantically rich evidence like "No API Versioning" (NAV) and "Hardcoded Endpoint" (HE). However, LLMs showed limitations on anti-patterns requiring explicit structural or cross-service dependency evidence, such as "Cyclic Dependency" (CD) and "Shared Persistence" (SP), where static analysis remains more reliable. The findings suggest LLMs are a promising complementary aid, not a replacement, for traditional analyzers in architectural assessment.

Key takeaway

For AI Architects evaluating microservice system quality, you should integrate LLM-based tools as a complementary layer to existing static analyzers. While your traditional tools like MARS will reliably detect structural anti-patterns such as cyclic dependencies, LLMs can effectively identify issues requiring semantic interpretation, like "No API Versioning" or "Hardcoded Endpoint." Consider a hybrid approach, combining LLMs with richer structural inputs like dependency graphs, to enhance detection coverage and reduce architectural decay.

Key insights

LLMs complement static analysis for microservice anti-pattern detection, excelling with semantic cues but struggling with structural dependencies.

Principles

Method

The study used a four-step pipeline: repository flattening (Repomix), LLM prompt execution (parametric, AP-specific), MARS execution, and data analysis (precision/recall against ground truth).

In practice

Topics

Code references

Best for: AI Scientist, AI Architect, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.