Trust-Aware Multi-Agent Traceability: Confidence-Calibrated Knowledge Graphs for Consistent Software Artifact Management

· Source: cs.SE updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

The Trust-Aware Multi-Agent Traceability framework addresses error propagation in sequential AI agent pipelines for software engineering, particularly in safety-critical domains. It proposes a shared knowledge graph as a coordination surface where agents use calibrated confidence scores to assess and build upon each other's contributions. The framework features a two-stage traceability link prediction pipeline combining embedding-based retrieval with LLM-based multi-criteria analysis, a traceability seeding mechanism comparing derivation-time and validation-time confidence, and a consistency protocol for confidence threshold gating, divergence detection, and conflict resolution. Evaluated on 535 automotive artifacts, the system achieved an F1-score of 0.769, significantly outperforming lexical (F1=0.349) and unfiltered LLM (F1=0.669) approaches. It also demonstrated perfect conflict detection at specific hyperparameters and an optimal link creation threshold of 0.7, balancing accuracy and manual effort.

Key takeaway

For AI Engineers and Software Architects managing multi-agent pipelines in safety-critical software development, integrating confidence-calibrated traceability and a consistency protocol is essential. You should implement a shared knowledge graph that actively uses agent-assigned confidence scores to gate low-confidence links, detect divergence between derivation and validation, and materialize conflicts. This approach ensures compliance with standards like ISO 26262 and ASPICE by providing an auditable, consistent record of artifact relationships and identified inconsistencies.

Key insights

Calibrated confidence scores enable trust-aware coordination and consistency detection in multi-agent software engineering pipelines.

Principles

Method

A two-stage pipeline combines embedding-based retrieval with LLM multi-criteria analysis for link prediction, followed by confidence calibration, seeding, and a consistency protocol for conflict detection and resolution.

In practice

Topics

Best for: Research Scientist, AI Architect, AI Scientist, AI Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.SE updates on arXiv.org.