Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations

2026-06-13 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

DiSan (Disentangled Sanitization) is a new privacy-preserving text sanitization framework designed for multi-agent collaboration, integrated into Intern-Shannon. It addresses privacy leakage beyond explicit identifiers, tackling distributional signatures like formatting and vocabulary choices when distributed agents exchange text across organizational boundaries. DiSan utilizes a two-stream encoder to factorize text into a source-invariant role subspace, preserving task semantics, and a source-identifying style subspace that remains local. The framework employs federated proto-type alignment and adversarial regularization, enabling joint training without centralizing raw text. Experimental results demonstrate that simple identifier-level masking, even at 19.2% of tokens, only reduces TF-IDF stylometric attribution by 18.6%. In contrast, DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness on a distributed multi-agent RAG benchmark, and lowers Enron stylometric attribution by 73.2% under TF-IDF and 70.6% under a neural probe.

Key takeaway

For AI Security Engineers designing multi-agent collaboration systems, relying solely on identifier masking for text privacy is inadequate. You should consider advanced sanitization frameworks like DiSan that disentangle stylistic and semantic information. Implementing such a system can significantly reduce answer-level PII exposure by 20 times while maintaining high task faithfulness (83%) in distributed RAG benchmarks, offering robust privacy beyond simple redaction.

Key insights

Privacy leakage in multi-agent text collaboration extends beyond explicit identifiers to stylistic patterns, necessitating disentangled sanitization.

Principles

Privacy leakage includes distributional signatures.
Disentangling style from semantics is key.
Federated learning enables privacy-preserving training.

Method

DiSan employs a two-stream encoder to factorize text into source-invariant role and source-identifying style subspaces. Federated proto-type alignment and adversarial regularization enable joint training without centralizing raw text.

In practice

Identifier masking is insufficient for privacy.
DiSan reduces PII exposure 20x in RAG.
Achieves 83% answer faithfulness post-sanitization.

Topics

Privacy-Preserving AI
Text Sanitization
Distributed Agents
Disentangled Representations
Federated Learning
PII Exposure

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.