MIRROR: Novelty-Constrained Memory-Guided MCTS Red-Teaming for Agentic RAG

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

MIRROR is a novel, unified red-teaming framework designed to address the expanded attack surface of multimodal agentic Retrieval-Augmented Generation (RAG) systems, which include vulnerabilities like text poisoning, image injection, direct-query attacks, and orchestrator-level tool manipulation. Existing red-teaming approaches are often surface-specific and suffer from high attack template duplication, measuring 73-84% on text-poisoning benchmarks. MIRROR employs memory-guided Monte Carlo tree search, conditioning candidate generation on retrieved context under an explicit novelty constraint. A deterministic Novelty Gate prevents prompt copying by rejecting candidates matching the retrieval set. Across four attack surfaces, MIRROR achieved a 76% Attack Success Rate (ASR) on image poisoning, significantly outperforming baselines at 52%. It also reached 97% ASR on orchestrator attacks with half the query cost and demonstrated the lowest cross-surface variance (coefficient of variation 0.47). The framework is released with ART-SafeBench, comprising 41,815 in-package records and over 41,991 total records across four surfaces.

Key takeaway

For AI Security Engineers developing or deploying multimodal agentic RAG systems, your current red-teaming strategies are likely insufficient against the expanded attack surface. MIRROR provides a unified, novelty-constrained Monte Carlo tree search framework that significantly improves attack success rates across diverse vectors like image poisoning and orchestrator attacks, while reducing query costs. You should integrate MIRROR and the ART-SafeBench dataset into your security testing pipeline to achieve more comprehensive and efficient vulnerability discovery.

Key insights

MIRROR unifies red-teaming for agentic RAG by using novelty-constrained MCTS to find diverse, effective attacks.

Principles

Method

MIRROR employs memory-guided Monte Carlo tree search, conditioning candidate generation on retrieved context under a deterministic Novelty Gate to ensure attack diversity and prevent duplication.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.