PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation

· Source: cs.LG updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

PropGuard is a propagation-aware framework designed to safeguard Large Language Model-based Multi-Agent Systems (LLM-MAS) from malicious instructions that can spread through inter-agent collaboration. It addresses limitations of existing local filtering or graph-based anomaly detection methods by tracing fine-grained propagation paths and remediating contaminated states without disrupting benign collaboration. PropGuard constructs a dual-view spatio-temporal graph, combining response-centric risk estimation with full-state evidence preservation. An inspector, trained with GE-GRPO, explores this graph to recover compact suspicious propagation subgraphs. These subgraphs then undergo diagnosis to verify harmful propagation, followed by source-guided remediation to correct upstream contamination and replay affected downstream interactions. Experiments across four communication architectures (chain, tree, star, random) and five attack settings (Prompt Injection, Tool Attacks, Memory Attacks) demonstrate PropGuard's consistent ability to lower attack success rates while maintaining high task-level defense success, achieving a favorable effectiveness–efficiency trade-off.

Key takeaway

For CTOs or VPs of Engineering/Data overseeing LLM-MAS deployments, PropGuard offers a robust defense against propagating malicious instructions. Its ability to trace contamination through spatio-temporal graphs and remediate at the source, rather than simply filtering, significantly reduces attack success rates while preserving system utility. You should consider integrating propagation-aware defense mechanisms like PropGuard to enhance the resilience and trustworthiness of your multi-agent systems, especially in complex, collaborative environments.

Key insights

PropGuard defends LLM-MAS by tracing malicious propagation through dual-view graphs and remediating at the source.

Principles

Method

PropGuard models LLM-MAS as a dual-view spatio-temporal graph, uses a GE-GRPO-trained inspector for subgraph exploration, then performs subgraph-aware diagnosis and source-guided remediation to correct contamination and replay interactions.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.