ProvenanceGuard: Source-Aware Factuality Verification for MCP-Based LLM Agents

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

ProvenanceGuard is a novel source-aware verifier for LLM agents utilizing the Model Context Protocol (MCP) to synthesize answers from heterogeneous evidence sources. It specifically targets cross-source conflation, a critical failure mode where claims are factually supported but incorrectly attributed to a source. The system operates by consuming MCP traces, decomposing answers into atomic claims, routing these claims to their specific evidence, and verifying support using NLI and token-alignment. It then compares the agent's stated attribution against the routed source, providing per-claim verdicts and an overall allow/block decision. Evaluated on 281 medical-domain MCP-agent traces, ProvenanceGuard achieved a block F1 of 0.802 and source accuracy of 0.858 on a 40-trace held-out split, outperforming source-blind baselines. It successfully detected all injected attribution swaps in 50 clinical conflation probes, demonstrating that source attribution is an independent axis for factuality verification.

Key takeaway

For NLP Engineers and Research Scientists developing or deploying MCP-based LLM agents, you must move beyond pooled-evidence factuality checks. This research demonstrates that source attribution is an independent and critical dimension for verifying agent outputs. You should integrate source-aware verification mechanisms, like ProvenanceGuard, into your agent pipelines to detect and mitigate cross-source conflation. This will significantly improve the reliability and trustworthiness of your agents' responses, especially when dealing with sensitive or heterogeneous data sources, by ensuring claims are not only supported but correctly attributed.

Key insights

Accurate source attribution is an independent and critical dimension for factuality verification in MCP-based LLM agents.

Principles

Method

ProvenanceGuard decomposes answers into atomic claims, routes them to source-specific evidence, checks support via NLI and token-alignment, compares stated attribution with routed sources, and returns per-claim verdicts for an allow/block decision.

In practice

Topics

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.