Trust-Aware Multi-Agent Traceability: Confidence-Calibrated Knowledge Graphs for Consistent Software Artifact Management

2026-06-17 · Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

The Trust-Aware Multi-Agent Traceability framework addresses error propagation in sequential AI agent pipelines for software engineering, particularly in safety-critical domains. It proposes a shared knowledge graph as a coordination surface where agents use calibrated confidence scores to assess and build upon each other's contributions. The framework features a two-stage traceability link prediction pipeline combining embedding-based retrieval with LLM-based multi-criteria analysis, a traceability seeding mechanism comparing derivation-time and validation-time confidence, and a consistency protocol for confidence threshold gating, divergence detection, and conflict resolution. Evaluated on 535 automotive artifacts, the system achieved an F1-score of 0.769, significantly outperforming lexical (F1=0.349) and unfiltered LLM (F1=0.669) approaches. It also demonstrated perfect conflict detection at specific hyperparameters and an optimal link creation threshold of 0.7, balancing accuracy and manual effort.

Key takeaway

For AI Engineers and Software Architects managing multi-agent pipelines in safety-critical software development, integrating confidence-calibrated traceability and a consistency protocol is essential. You should implement a shared knowledge graph that actively uses agent-assigned confidence scores to gate low-confidence links, detect divergence between derivation and validation, and materialize conflicts. This approach ensures compliance with standards like ISO 26262 and ASPICE by providing an auditable, consistent record of artifact relationships and identified inconsistencies.

Key insights

Calibrated confidence scores enable trust-aware coordination and consistency detection in multi-agent software engineering pipelines.

Principles

Upstream agent confidence should signal reliability for downstream decisions.
Divergence between derivation and validation confidence indicates inconsistencies.
Knowledge graphs can serve as active coordination surfaces, not just passive storage.

Method

A two-stage pipeline combines embedding-based retrieval with LLM multi-criteria analysis for link prediction, followed by confidence calibration, seeding, and a consistency protocol for conflict detection and resolution.

In practice

Implement a two-stage link prediction for improved traceability accuracy.
Use confidence thresholds to gate low-confidence links in pipelines.
Materialize detected conflicts as graph entities for human review.

Topics

Multi-Agent Systems
Software Traceability
Knowledge Graphs
Confidence Calibration
LLM Applications
Safety-Critical Systems
Automotive Software Engineering

Best for: Research Scientist, AI Architect, AI Scientist, AI Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.