CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

CAPRA (Configurable Architecture Proficiency Report Assessment) is a multi-agent LLM system designed to automate formative feedback on software architecture deliverables in engineering education. It employs a Python-based microservice for multi-modal document extraction, utilizing PyMuPDF and gpt-4o vision to parse text and UML diagrams. A core design choice is its deterministic Evidence Anchoring step, using fuzzy matching via normalized Levenshtein distance, and a ConsistencyManager agent to cross-verify findings and mitigate hallucinations. A preliminary evaluation on 10 student reports showed CAPRA satisfied 88.8% of criteria under strict aggregation, achieved moderate inter-rater agreement (κ=0.582), and processed each report in slightly over 4 minutes at approximately \$0.44 per report, offering a 7.2–10.8× speedup over manual review.

Key takeaway

For software engineering educators struggling with the time-consuming task of reviewing architectural deliverables, CAPRA offers a significant efficiency gain. You can deploy this multi-agent LLM system as a first-pass teaching assistant to provide rapid, formative feedback within minutes, freeing up your time for more subjective assessments. While CAPRA effectively identifies objective issues and semantic inconsistencies, human oversight remains crucial for nuanced architectural design pattern evaluations and complex grounded issues. Consider integrating it for intermediate evaluations to enable high-frequency feedback cycles.

Key insights

CAPRA uses a multi-agent LLM system with deterministic evidence anchoring to automate architectural feedback, improving scalability and reliability.

Principles

Method

CAPRA's four-stage pipeline includes Document Parsing (PyMuPDF, gpt-4o vision), Parallel Verification (specialized agents), Evidence Anchoring (fuzzy matching, confidence modulation), and Report Generation (LaTeX templates, claude-haiku-4.5).

In practice

Topics

Best for: AI Scientist, Research Scientist, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.