Are LLMs Ready for Anti-Pattern Detection in Microservice Architectures?
Summary
A study investigated the readiness of Large Language Models (LLMs) for detecting architectural anti-patterns (APs) in microservice architectures (MSAs) using a prompt-based analysis pipeline over static repository artifacts. Researchers evaluated three general-purpose LLMs—ChatGPT 5.2, Gemini 3 Pro, and Qwen 3.5 Plus—on a benchmark of 13 microservice repositories annotated with 16 architectural anti-patterns. Their performance was compared against MARS, a static-analysis tool, using precision and recall metrics. Results indicate that LLMs offer useful support, achieving competitive performance on several anti-patterns, particularly those inferred from local, heterogeneous, or semantically rich evidence like "No API Versioning" (NAV) and "Hardcoded Endpoint" (HE). However, LLMs showed limitations on anti-patterns requiring explicit structural or cross-service dependency evidence, such as "Cyclic Dependency" (CD) and "Shared Persistence" (SP), where static analysis remains more reliable. The findings suggest LLMs are a promising complementary aid, not a replacement, for traditional analyzers in architectural assessment.
Key takeaway
For AI Architects evaluating microservice system quality, you should integrate LLM-based tools as a complementary layer to existing static analyzers. While your traditional tools like MARS will reliably detect structural anti-patterns such as cyclic dependencies, LLMs can effectively identify issues requiring semantic interpretation, like "No API Versioning" or "Hardcoded Endpoint." Consider a hybrid approach, combining LLMs with richer structural inputs like dependency graphs, to enhance detection coverage and reduce architectural decay.
Key insights
LLMs complement static analysis for microservice anti-pattern detection, excelling with semantic cues but struggling with structural dependencies.
Principles
- LLMs excel with semantic, local evidence.
- Static analysis is superior for structural proof.
- Interpretive flexibility risks false positives.
Method
The study used a four-step pipeline: repository flattening (Repomix), LLM prompt execution (parametric, AP-specific), MARS execution, and data analysis (precision/recall against ground truth).
In practice
- Use LLMs for qualitative design judgment.
- Combine LLMs with structural analysis tools.
- Prioritize static analysis for explicit dependencies.
Topics
- Microservice Architectures
- Anti-Pattern Detection
- Large Language Models
- Static Analysis
- Architectural Assessment
- Prompt Engineering
- MARS Tool
Code references
- yamadashy/repomix
- apolloconfig/apollo
- benwilcock/cqrs-microservice-sampler
- DescartesResearch/TeaStore
- ewolff/microservice
Best for: AI Scientist, AI Architect, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.