RAG Security and Privacy: Formalizing the Threat Model and Attack Surface

2025-09-05 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

Retrieval-Augmented Generation (RAG) systems, which combine large language models (LLMs) with external document retrieval, enhance factual accuracy but introduce distinct privacy and security challenges beyond traditional LLM vulnerabilities. This paper addresses a critical gap by proposing the first formal threat model for retrieval-RAG systems. It introduces a structured taxonomy of adversary types, categorized by their access to model components and data, including black-box vs. white-box access and normal vs. informed knowledge. The work formally defines key threat vectors that pose significant privacy and integrity risks in real-world deployments. These include document-level membership inference attacks (DL-MIA), which infer document presence in the knowledge base; content leakage attacks, where sensitive information is reconstructed from generated outputs; and data poisoning attacks, which inject malicious documents to manipulate model behavior. This formalization aims to establish a rigorous understanding of security and privacy in RAG systems.

Key takeaway

For AI Security Engineers deploying RAG systems, you must recognize that these architectures introduce distinct attack surfaces beyond traditional LLM risks. Implement retriever-level differential privacy to guard against document membership inference. Additionally, employ robust prompt engineering and adversarial training to prevent sensitive content leakage. Proactively integrate embedding-aware filtering and query-response analysis to mitigate data poisoning risks within your knowledge bases. Your defense strategy needs to span all RAG components, not just the LLM.

Key insights

RAG systems introduce unique privacy and security risks requiring a formal threat model and specific defense strategies.

Principles

RAG systems inherit LLM vulnerabilities and add new attack surfaces.
Adversary capabilities vary by model access and prior knowledge.
Document-level privacy is crucial for RAG knowledge bases.

Method

The paper formalizes a RAG threat model by defining adversary types based on model access (black-box/white-box) and knowledge (normal/informed), then formally characterizes document-level membership inference, content leakage, and data poisoning attacks.

In practice

Implement retriever-level differential privacy to prevent DL-MIA.
Use prompt engineering and adversarial training to mitigate content leakage.
Employ embedding-aware filtering to defend against data poisoning.

Topics

RAG Systems
Threat Modeling
Data Privacy
Data Poisoning
Membership Inference
LLM Security

Best for: AI Architect, Research Scientist, CTO, AI Scientist, AI Security Engineer, Machine Learning Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.