CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System
Summary
CAPRA (Configurable Architecture Proficiency Report Assessment) is a multi-agent LLM system designed to automate feedback generation for software architecture deliverables. It addresses a gap in automated assessment for structural completeness and requirements traceability, which traditionally lacks full automation. CAPRA employs a Python-based microservice for multi-modal document extraction, utilizing PyMuPDF and vision-enabled LLMs like gpt-4o to parse text and UML diagrams. A core design choice involves coordinating multiple specialized agents. To ensure educational reliability and mitigate hallucinations, the system incorporates a deterministic Evidence Anchoring step using fuzzy matching via normalized Levenshtein distance, alongside a ConsistencyManager agent for cross-verification, deduplication, and merging of findings. A preliminary evaluation on 10 student reports demonstrated that CAPRA satisfied 88.8% of evaluated criteria under a strict two-rater aggregation rule, achieved moderate inter-rater agreement (kappa = 0.582) with human evaluators, and processed each report in slightly over 4 minutes.
Key takeaway
For AI Engineers developing automated assessment tools, CAPRA demonstrates a viable approach to scaling feedback on complex deliverables. You should consider multi-agent LLM architectures combined with deterministic evidence anchoring and consistency management to enhance reliability. This method can significantly reduce processing time, as seen with reports processed in 4 minutes, but human oversight remains crucial for subjective assessment dimensions.
Key insights
Multi-agent LLM systems can automate complex software architecture feedback by combining specialized agents and robust verification.
Principles
- Decompose complex tasks into specialized agent roles.
- Anchor LLM outputs to source evidence deterministically.
- Employ consistency checks to mitigate hallucinations.
Method
CAPRA uses a Python microservice with PyMuPDF and gpt-4o for multi-modal extraction, then specialized agents process and verify findings using fuzzy matching and a ConsistencyManager.
In practice
- Use vision-enabled LLMs for multi-modal document parsing.
- Implement fuzzy matching for evidence anchoring.
- Design agent systems with cross-verification mechanisms.
Topics
- Multi-Agent Systems
- LLM Feedback
- Software Architecture
- Automated Assessment
- Evidence Anchoring
- gpt-4o
Best for: AI Scientist, AI Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.