MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems
Summary
MAStrike is a closed-loop framework designed for collusive red-teaming in hierarchical multi-agent systems (MAS). It addresses limitations of existing red-teaming methods by introducing the first agent-level Shapley value analysis, which quantifies each agent's marginal contribution to system robustness under task-specific distributions. Building on this attribution, an autonomous red-teaming agent identifies vulnerable coalitions and generates coordinated, role-aware adversarial manipulations. These attacks are iteratively refined through structured failure diagnosis. The framework includes MABench, a comprehensive MAS red-teaming benchmark spanning finance, software engineering, and CRM. Experiments demonstrate MAStrike significantly outperforms heuristic baselines, achieving 61.8% ASR against Claude Opus 4.7 and 55.6% against GPT-5.5, uncovering critical vulnerabilities overlooked by prior methods.
Key takeaway
For AI Security Engineers evaluating multi-agent system robustness, MAStrike demonstrates that uncoordinated red-teaming is largely ineffective. You should prioritize identifying high-impact agent coalitions using quantitative methods like Shapley values and design coordinated, role-aware adversarial manipulations. This approach is crucial for uncovering distributed vulnerabilities that single-agent or template-based attacks miss, especially in high-stakes domains like finance and software engineering.
Key insights
Quantifying agent contribution via Shapley values enables targeted, coordinated collusive attacks on multi-agent systems.
Principles
- Agent importance is sparse and task-dependent.
- High individual importance does not imply strong agent coalition synergy.
- Coordinated attacks effectively bypass distributed safety mechanisms.
Method
MAStrike uses Shapley value analysis to attribute agent vulnerability, selects synergy-aware coalitions, and iteratively refines coordinated, role-aware adversarial manipulations via a closed-loop red-teaming agent.
In practice
- Use Shapley values to identify critical agents.
- Generate mutually consistent attack prompts for coalitions.
- Iteratively refine attacks with execution feedback.
Topics
- Multi-Agent Systems
- Red-Teaming
- Shapley Values
- Collusive Attacks
- MAS Security
- LLM Agents
- MABench
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.