MosaicLeaks: Can your research agent keep a secret?

2026-06-18 · Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy, Robotics & Autonomous Systems · Depth: Expert, long

Summary

MosaicLeaks, a new deep-research task and benchmark, reveals significant privacy risks in AI agents that combine private local documents with external web retrieval. Published on June 18, 2026, by Alexander Gurung and Rafael Pardinas from ServiceNow, this work demonstrates that agents frequently leak sensitive information through external queries, a phenomenon termed the "mosaic effect." The benchmark, comprising 1,001 multi-hop research chains, showed that training agents solely for task performance exacerbated leakage, increasing answer/full-information leakage from 34.0% to 51.7% for Qwen3-4B. To address this, the authors propose Privacy-Aware Deep Research (PA-DR), an RL training method that integrates situational task rewards and a Qwen3-4B privacy classifier. PA-DR successfully improved strict chain success from 48.7% to 58.7% while drastically reducing answer/full-information leakage to 9.9%, proving that privacy must be trained in, not merely prompted.

Key takeaway

For MLOps Engineers deploying deep research agents that access both private and public data, relying solely on task performance optimization will significantly increase privacy leakage. You should integrate privacy-aware training methods like PA-DR, which combines situational task rewards with a learned privacy reward, to achieve high task success (58.7%) while drastically reducing information leakage (to 9.9%). Do not expect simple prompting to secure your agents; explicit privacy training is essential to prevent sensitive data exposure through the "mosaic effect" in query logs.

Key insights

Training deep research agents for performance alone increases privacy leakage; privacy must be explicitly trained in.

Principles

Mosaic effect: cumulative web queries can reveal private facts.
Task performance often conflicts with privacy in agent design.
Prompting agents for privacy is largely ineffective.

Method

Privacy-Aware Deep Research (PA-DR) uses situational task rewards and a Qwen3-4B classifier-based privacy reward to penalize revealing web queries, training agents for both performance and reduced leakage.

In practice

Use multi-hop questions to test agent privacy risks.
Implement situational rewards for precise credit assignment.
Integrate privacy classifiers into RL training loops.

Topics

Deep Research Agents
Privacy Leakage
Mosaic Effect
Reinforcement Learning
Privacy-Aware Deep Research (PA-DR)
Qwen3-4B

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.