CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System
Summary
CAPRA (Configurable Architecture Proficiency Report Assessment) is a multi-agent LLM system designed to automate formative feedback on software architecture deliverables in engineering education. It employs a Python-based microservice for multi-modal document extraction, utilizing PyMuPDF and gpt-4o vision to parse text and UML diagrams. A core design choice is its deterministic Evidence Anchoring step, using fuzzy matching via normalized Levenshtein distance, and a ConsistencyManager agent to cross-verify findings and mitigate hallucinations. A preliminary evaluation on 10 student reports showed CAPRA satisfied 88.8% of criteria under strict aggregation, achieved moderate inter-rater agreement (κ=0.582), and processed each report in slightly over 4 minutes at approximately \$0.44 per report, offering a 7.2–10.8× speedup over manual review.
Key takeaway
For software engineering educators struggling with the time-consuming task of reviewing architectural deliverables, CAPRA offers a significant efficiency gain. You can deploy this multi-agent LLM system as a first-pass teaching assistant to provide rapid, formative feedback within minutes, freeing up your time for more subjective assessments. While CAPRA effectively identifies objective issues and semantic inconsistencies, human oversight remains crucial for nuanced architectural design pattern evaluations and complex grounded issues. Consider integrating it for intermediate evaluations to enable high-frequency feedback cycles.
Key insights
CAPRA uses a multi-agent LLM system with deterministic evidence anchoring to automate architectural feedback, improving scalability and reliability.
Principles
- Multi-agent systems improve complex reasoning tasks.
- Deterministic anchoring curbs LLM hallucinations.
- Decompose evaluation into focused, rubric-driven steps.
Method
CAPRA's four-stage pipeline includes Document Parsing (PyMuPDF, gpt-4o vision), Parallel Verification (specialized agents), Evidence Anchoring (fuzzy matching, confidence modulation), and Report Generation (LaTeX templates, claude-haiku-4.5).
In practice
- Automate formative feedback on architecture documents.
- Generate personalized, template-compliant LaTeX reports.
- Adapt rubrics from historical student reports.
Topics
- Multi-Agent Systems
- LLM-based Assessment
- Software Architecture
- Formative Feedback
- Evidence Anchoring
- UML Diagram Analysis
Best for: AI Scientist, Research Scientist, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.