RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

Retrieval-Augmented Generation (RAG) systems, which combine large language models (LLMs) with external document retrieval, enhance factual accuracy but introduce distinct privacy and security challenges beyond traditional LLM vulnerabilities. This paper addresses a critical gap by proposing the first formal threat model for retrieval-RAG systems. It introduces a structured taxonomy of adversary types, categorized by their access to model components and data, including black-box vs. white-box access and normal vs. informed knowledge. The work formally defines key threat vectors that pose significant privacy and integrity risks in real-world deployments. These include document-level membership inference attacks (DL-MIA), which infer document presence in the knowledge base; content leakage attacks, where sensitive information is reconstructed from generated outputs; and data poisoning attacks, which inject malicious documents to manipulate model behavior. This formalization aims to establish a rigorous understanding of security and privacy in RAG systems.

Key takeaway

For AI Security Engineers deploying RAG systems, you must recognize that these architectures introduce distinct attack surfaces beyond traditional LLM risks. Implement retriever-level differential privacy to guard against document membership inference. Additionally, employ robust prompt engineering and adversarial training to prevent sensitive content leakage. Proactively integrate embedding-aware filtering and query-response analysis to mitigate data poisoning risks within your knowledge bases. Your defense strategy needs to span all RAG components, not just the LLM.

Key insights

RAG systems introduce unique privacy and security risks requiring a formal threat model and specific defense strategies.

Principles

Method

The paper formalizes a RAG threat model by defining adversary types based on model access (black-box/white-box) and knowledge (normal/informed), then formally characterizes document-level membership inference, content leakage, and data poisoning attacks.

In practice

Topics

Best for: AI Architect, Research Scientist, CTO, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.