Privacy-Preserving RAG via Multi-Agent Semantic Rewriting: Achieving Confidentiality Without Compromising Contextual Fidelity

2026-06-23 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Advanced, quick

Summary

A multi-agent framework is proposed to enhance privacy in Retrieval-Augmented Generation (RAG) systems by sanitizing retrieved content. This approach addresses privacy leakage risks from malicious prompts in sensitive scenarios. The framework employs three specialized agents for privacy extraction, semantic analysis, and reconstruction, collaboratively removing sensitive identifiers while preserving the semantic core. Evaluated on ChatDoctor and Wiki-PII datasets across six large language models, the system significantly reduced targeted information exposure in LLaMA-3-8B from 144 instances to just 1. It also maintained strong contextual fidelity with a BLEU-1 score of 0.122, surpassing the existing SAGE method's 0.117. Operating as an asynchronous preprocessing module, it introduces no additional latency to online inference.

Key takeaway

For AI Security Engineers deploying RAG in sensitive environments, this multi-agent semantic rewriting framework offers a robust solution to mitigate privacy leakage. You can significantly reduce targeted information exposure, as demonstrated by the LLaMA-3-8B reduction from 144 to 1, without sacrificing contextual fidelity. Consider integrating this asynchronous preprocessing module to enhance data confidentiality in your RAG applications, ensuring sensitive data remains protected during retrieval and generation.

Key insights

A multi-agent framework sanitizes RAG content via semantic rewriting, achieving privacy without compromising contextual fidelity.

Principles

Collaborative agents enhance RAG privacy.
Asynchronous preprocessing prevents latency.
Semantic rewriting preserves context.

Method

Three specialized agents perform privacy extraction, semantic analysis, and content reconstruction. This process removes sensitive identifiers while maintaining the semantic core.

In practice

Reduce PII leakage in RAG.
Maintain high contextual fidelity.
Preprocess data offline for RAG.

Topics

Retrieval-Augmented Generation
Privacy Preservation
Multi-Agent Systems
Semantic Rewriting
Large Language Models
Data Sanitization

Code references

foursoils/Privacy-Preserving-RAG

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.