Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents

2026-06-25 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

A new survey titled "Agents That Know Too Much: A Data-Centric Survey of Privacy in LLM Agents" examines the escalating privacy challenges as large language model agents increasingly interact with sensitive data sources like databases, document collections, and external APIs. These agents, which maintain state across sessions and act with delegated permissions, present multiple leakage points beyond just final answers, including queries, intermediate results, memory writes, and inter-agent communications. The survey organizes the field by the data an agent touches, taxonomizing data sources, associated privacy risks, and governance mechanisms. It also maps existing benchmarks and identifies gaps. Key findings include that information-flow control is the only governance mechanism addressing compositional and cross-session inference leakage, and there is a critical absence of benchmarks that evaluate an agent's privacy adherence across all its data surfaces under a single policy.

Key takeaway

For AI Security Engineers designing LLM agent systems, this survey highlights the critical need to move beyond output-centric privacy controls. You should prioritize implementing information-flow control mechanisms to mitigate compositional and cross-session inference leakage, which are currently least protected. Furthermore, your teams must develop comprehensive privacy policies that span all data surfaces an agent touches, as current benchmarks and governance approaches are insufficient for evaluating such holistic adherence.

Key insights

LLM agents' extensive data interaction creates complex privacy risks, requiring a data-centric governance approach.

Principles

Privacy risks extend beyond final answers to intermediate data flows.
Information-flow control addresses compositional and cross-session leakage.
Current benchmarks lack unified privacy policy evaluation across data surfaces.

Method

The survey proposes taxonomizing agent data sources, privacy risks, and governance mechanisms, then mapping benchmarks and identifying open problems.

In practice

Implement information-flow control for multi-step agent workflows.
Design privacy policies covering all agent data interactions.
Develop benchmarks for cross-data surface privacy evaluation.

Topics

LLM Agents
Privacy
Data Security
Information Flow Control
Retrieval-Augmented Generation
Agent Benchmarking

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.