Mining Architectural Quality Under Agentic AI Adoption: A Causal Study of Java Repositories

2026-06-12 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Data Science & Analytics · Depth: Expert, extended

Summary

A causal study analyzed the impact of agentic AI coding tool adoption on software architectural quality in 151 open-source Java repositories. Researchers mined 1,811 monthly Arcan snapshots over 13 months, using a staggered difference-in-differences design. The study found an apparent 6.7% reduction in Architectural Smell Density (ASD) (p=0.004) in repositories using tools like Cursor, GitHub Copilot, Claude Code, and Aider. However, this reduction was a "denominator effect": total architectural smell counts remained largely unchanged (+1.1%, p=0.82), while lines of code grew substantially by 12.8% (p=0.003). This indicates that agentic AI adoption does not degrade architectural quality in established projects over a six-month window, but rather expands code volume without a proportional increase in structural anti-patterns. The study also warns that density-normalized metrics can be misleading if treatment affects system size.

Key takeaway

For Software Architects or AI Engineers evaluating agentic AI coding tool adoption, this study suggests you need not immediately escalate architectural safeguards. While code volume increases, architectural smell density declines due to faster code growth, not fewer smells. You should track raw architectural smell counts, not just density, and re-evaluate architectural impact over observation windows longer than six months. Also, monitor coupling metrics as a complementary signal to ensure long-term structural integrity.

Key insights

Agentic AI tools expand code volume in Java projects without proportionally increasing architectural smells.

Principles

Density-normalized metrics mislead when treatment affects system size.
Architectural quality and code-level quality are distinct concerns.
Observable agentic AI usage leaves detectable repository artifacts.

Method

A staggered difference-in-differences design with Borusyak imputation estimator on 1,811 monthly Arcan snapshots from 151 Java repositories.

In practice

Track raw architectural smell counts alongside density metrics.
Re-examine architectural quality at 12+ months post-adoption.
Decompose density-normalized metrics into numerator/denominator changes.

Topics

Agentic AI
Software Architecture
Architectural Smells
Causal Inference
Difference-in-Differences
Java Repositories

Code references

Oliver1703dk/seaa2026-replication-package

Best for: AI Scientist, Research Scientist, Software Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.