Trust-Aware Multi-Agent Traceability: Confidence-Calibrated Knowledge Graphs for Consistent Software Artifact Management
Summary
The Trust-Aware Multi-Agent Traceability framework addresses error propagation in sequential AI agent pipelines for software engineering, particularly in safety-critical domains. It proposes a shared knowledge graph as a coordination surface where agents use calibrated confidence scores to assess and build upon each other's contributions. The framework features a two-stage traceability link prediction pipeline combining embedding-based retrieval with LLM-based multi-criteria analysis, a traceability seeding mechanism comparing derivation-time and validation-time confidence, and a consistency protocol for confidence threshold gating, divergence detection, and conflict resolution. Evaluated on 535 automotive artifacts, the system achieved an F1-score of 0.769, significantly outperforming lexical (F1=0.349) and unfiltered LLM (F1=0.669) approaches. It also demonstrated perfect conflict detection at specific hyperparameters and an optimal link creation threshold of 0.7, balancing accuracy and manual effort.
Key takeaway
For AI Engineers and Software Architects managing multi-agent pipelines in safety-critical software development, integrating confidence-calibrated traceability and a consistency protocol is essential. You should implement a shared knowledge graph that actively uses agent-assigned confidence scores to gate low-confidence links, detect divergence between derivation and validation, and materialize conflicts. This approach ensures compliance with standards like ISO 26262 and ASPICE by providing an auditable, consistent record of artifact relationships and identified inconsistencies.
Key insights
Calibrated confidence scores enable trust-aware coordination and consistency detection in multi-agent software engineering pipelines.
Principles
- Upstream agent confidence should signal reliability for downstream decisions.
- Divergence between derivation and validation confidence indicates inconsistencies.
- Knowledge graphs can serve as active coordination surfaces, not just passive storage.
Method
A two-stage pipeline combines embedding-based retrieval with LLM multi-criteria analysis for link prediction, followed by confidence calibration, seeding, and a consistency protocol for conflict detection and resolution.
In practice
- Implement a two-stage link prediction for improved traceability accuracy.
- Use confidence thresholds to gate low-confidence links in pipelines.
- Materialize detected conflicts as graph entities for human review.
Topics
- Multi-Agent Systems
- Software Traceability
- Knowledge Graphs
- Confidence Calibration
- LLM Applications
- Safety-Critical Systems
- Automotive Software Engineering
Best for: Research Scientist, AI Architect, AI Scientist, AI Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.