PropGuard: Safeguarding LLM-MAS via Propagation-Aware Exploration and Remediation
Summary
PropGuard is a propagation-aware framework designed to safeguard Large Language Model-based Multi-Agent Systems (LLM-MAS) from malicious instructions that can spread through inter-agent collaboration. It addresses limitations of existing local filtering or graph-based anomaly detection methods by tracing fine-grained propagation paths and remediating contaminated states without disrupting benign collaboration. PropGuard constructs a dual-view spatio-temporal graph, combining response-centric risk estimation with full-state evidence preservation. An inspector, trained with GE-GRPO, explores this graph to recover compact suspicious propagation subgraphs. These subgraphs then undergo diagnosis to verify harmful propagation, followed by source-guided remediation to correct upstream contamination and replay affected downstream interactions. Experiments across four communication architectures (chain, tree, star, random) and five attack settings (Prompt Injection, Tool Attacks, Memory Attacks) demonstrate PropGuard's consistent ability to lower attack success rates while maintaining high task-level defense success, achieving a favorable effectiveness–efficiency trade-off.
Key takeaway
For CTOs or VPs of Engineering/Data overseeing LLM-MAS deployments, PropGuard offers a robust defense against propagating malicious instructions. Its ability to trace contamination through spatio-temporal graphs and remediate at the source, rather than simply filtering, significantly reduces attack success rates while preserving system utility. You should consider integrating propagation-aware defense mechanisms like PropGuard to enhance the resilience and trustworthiness of your multi-agent systems, especially in complex, collaborative environments.
Key insights
PropGuard defends LLM-MAS by tracing malicious propagation through dual-view graphs and remediating at the source.
Principles
- Malicious influence propagates spatio-temporally.
- Dual-view graphs enable efficient risk estimation and rich evidence preservation.
- Source-guided remediation preserves utility.
Method
PropGuard models LLM-MAS as a dual-view spatio-temporal graph, uses a GE-GRPO-trained inspector for subgraph exploration, then performs subgraph-aware diagnosis and source-guided remediation to correct contamination and replay interactions.
In practice
- Use dual-view graphs for LLM-MAS security.
- Employ RL-driven inspectors for subgraph exploration.
- Prioritize source-guided remediation over pruning.
Topics
- LLM-MAS Safeguarding
- Propagation-Aware Defense
- Dual-View Spatio-Temporal Graph
- GE-GRPO
- RL-Driven Subgraph Exploration
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.LG updates on arXiv.org.