MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG
Summary
MIRROR is a novel, unified red-teaming framework designed to address the expanded attack surface of multimodal agentic Retrieval-Augmented Generation (RAG) systems, which include vulnerabilities like text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-teaming approaches are often surface-specific and suffer from high attack template duplication, measuring 73-84% on text-poisoning benchmarks. MIRROR employs memory-guided Monte Carlo tree search, conditioning candidate generation on retrieved context under an explicit novelty constraint. A deterministic Novelty Gate prevents prompt copying by rejecting candidates matching the retrieval set. Across four attack surfaces, MIRROR achieved a 76% Attack Success Rate (ASR) on image poisoning, significantly outperforming baselines at 52%. It also reached 97% ASR on orchestrator attacks with half the query cost and demonstrated the lowest cross-surface variance (coefficient of variation 0.47). The framework is released with ART-SafeBench, comprising 41,815 in-package records and over 41,991 total records across four surfaces.
Key takeaway
For AI Security Engineers developing or deploying multimodal agentic RAG systems, your current red-teaming strategies are likely insufficient against the expanded attack surface. MIRROR provides a unified, novelty-constrained Monte Carlo tree search framework that significantly improves attack success rates across diverse vectors like image poisoning and orchestrator attacks, while reducing query costs. You should integrate MIRROR and the ART-SafeBench dataset into your security testing pipeline to achieve more comprehensive and efficient vulnerability discovery.
Key insights
MIRROR unifies red-teaming for agentic RAG by using novelty-constrained MCTS to find diverse, effective attacks.
Principles
- Multimodal agentic RAG expands attack surfaces.
- Novelty constraints improve red-teaming effectiveness.
- Unified frameworks outperform specialized baselines.
Method
MIRROR employs memory-guided Monte Carlo tree search, conditioning candidate generation on retrieved context under a deterministic Novelty Gate to ensure attack diversity and prevent duplication.
In practice
- Apply MIRROR for comprehensive agentic RAG red-teaming.
- Utilize novelty constraints to generate diverse attack vectors.
- Evaluate RAG systems with ART-SafeBench.
Topics
- Agentic RAG
- AI Security
- Red Teaming
- Monte Carlo Tree Search
- Multimodal AI
- ART-SafeBench
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.