CAPRA: Scaling Feedback on Software Architecture Deliverables with a Multi-Agent LLM System

2026-06-17 · Source: Artificial Intelligence · Field: Technology & Digital — Software Development & Engineering, Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

CAPRA (Configurable Architecture Proficiency Report Assessment) is a multi-agent LLM system designed to automate feedback generation for software architecture deliverables. It addresses a gap in automated assessment for structural completeness and requirements traceability, which traditionally lacks full automation. CAPRA employs a Python-based microservice for multi-modal document extraction, utilizing PyMuPDF and vision-enabled LLMs like gpt-4o to parse text and UML diagrams. A core design choice involves coordinating multiple specialized agents. To ensure educational reliability and mitigate hallucinations, the system incorporates a deterministic Evidence Anchoring step using fuzzy matching via normalized Levenshtein distance, alongside a ConsistencyManager agent for cross-verification, deduplication, and merging of findings. A preliminary evaluation on 10 student reports demonstrated that CAPRA satisfied 88.8% of evaluated criteria under a strict two-rater aggregation rule, achieved moderate inter-rater agreement (kappa = 0.582) with human evaluators, and processed each report in slightly over 4 minutes.

Key takeaway

For AI Engineers developing automated assessment tools, CAPRA demonstrates a viable approach to scaling feedback on complex deliverables. You should consider multi-agent LLM architectures combined with deterministic evidence anchoring and consistency management to enhance reliability. This method can significantly reduce processing time, as seen with reports processed in 4 minutes, but human oversight remains crucial for subjective assessment dimensions.

Key insights

Multi-agent LLM systems can automate complex software architecture feedback by combining specialized agents and robust verification.

Principles

Decompose complex tasks into specialized agent roles.
Anchor LLM outputs to source evidence deterministically.
Employ consistency checks to mitigate hallucinations.

Method

CAPRA uses a Python microservice with PyMuPDF and gpt-4o for multi-modal extraction, then specialized agents process and verify findings using fuzzy matching and a ConsistencyManager.

In practice

Use vision-enabled LLMs for multi-modal document parsing.
Implement fuzzy matching for evidence anchoring.
Design agent systems with cross-verification mechanisms.

Topics

Multi-Agent Systems
LLM Feedback
Software Architecture
Automated Assessment
Evidence Anchoring
gpt-4o

Best for: AI Scientist, AI Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.