CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System

· Source: Artificial Intelligence · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

CAPRA (Configurable Architecture Proficiency Report Assessment) is a multi-agent LLM system designed to automate feedback generation for software architecture deliverables. It addresses a gap in automated assessment for structural completeness and requirements traceability, which traditionally lacks full automation. CAPRA employs a Python-based microservice for multi-modal document extraction, utilizing PyMuPDF and vision-enabled LLMs like gpt-4o to parse text and UML diagrams. A core design choice involves coordinating multiple specialized agents. To ensure educational reliability and mitigate hallucinations, the system incorporates a deterministic Evidence Anchoring step using fuzzy matching via normalized Levenshtein distance, alongside a ConsistencyManager agent for cross-verification, deduplication, and merging of findings. A preliminary evaluation on 10 student reports demonstrated that CAPRA satisfied 88.8% of evaluated criteria under a strict two-rater aggregation rule, achieved moderate inter-rater agreement (kappa = 0.582) with human evaluators, and processed each report in slightly over 4 minutes.

Key takeaway

For AI Engineers developing automated assessment tools, CAPRA demonstrates a viable approach to scaling feedback on complex deliverables. You should consider multi-agent LLM architectures combined with deterministic evidence anchoring and consistency management to enhance reliability. This method can significantly reduce processing time, as seen with reports processed in 4 minutes, but human oversight remains crucial for subjective assessment dimensions.

Key insights

Multi-agent LLM systems can automate complex software architecture feedback by combining specialized agents and robust verification.

Principles

Method

CAPRA uses a Python microservice with PyMuPDF and gpt-4o for multi-modal extraction, then specialized agents process and verify findings using fuzzy matching and a ConsistencyManager.

In practice

Topics

Best for: AI Scientist, AI Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.