Privacy-Preserving Text Sanitization for Distributed Agents Collaboration via Disentangled Representations
Summary
DiSan (Disentangled Sanitization) is a new privacy-preserving text sanitization framework designed for multi-agent collaboration, integrated into Intern-Shannon. It addresses privacy leakage beyond explicit identifiers, tackling distributional signatures like formatting and vocabulary choices when distributed agents exchange text across organizational boundaries. DiSan utilizes a two-stream encoder to factorize text into a source-invariant role subspace, preserving task semantics, and a source-identifying style subspace that remains local. The framework employs federated proto-type alignment and adversarial regularization, enabling joint training without centralizing raw text. Experimental results demonstrate that simple identifier-level masking, even at 19.2% of tokens, only reduces TF-IDF stylometric attribution by 18.6%. In contrast, DiSan reduces answer-level PII exposure by 20 times while maintaining 83% answer faithfulness on a distributed multi-agent RAG benchmark, and lowers Enron stylometric attribution by 73.2% under TF-IDF and 70.6% under a neural probe.
Key takeaway
For AI Security Engineers designing multi-agent collaboration systems, relying solely on identifier masking for text privacy is inadequate. You should consider advanced sanitization frameworks like DiSan that disentangle stylistic and semantic information. Implementing such a system can significantly reduce answer-level PII exposure by 20 times while maintaining high task faithfulness (83%) in distributed RAG benchmarks, offering robust privacy beyond simple redaction.
Key insights
Privacy leakage in multi-agent text collaboration extends beyond explicit identifiers to stylistic patterns, necessitating disentangled sanitization.
Principles
- Privacy leakage includes distributional signatures.
- Disentangling style from semantics is key.
- Federated learning enables privacy-preserving training.
Method
DiSan employs a two-stream encoder to factorize text into source-invariant role and source-identifying style subspaces. Federated proto-type alignment and adversarial regularization enable joint training without centralizing raw text.
In practice
- Identifier masking is insufficient for privacy.
- DiSan reduces PII exposure 20x in RAG.
- Achieves 83% answer faithfulness post-sanitization.
Topics
- Privacy-Preserving AI
- Text Sanitization
- Distributed Agents
- Disentangled Representations
- Federated Learning
- PII Exposure
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, AI Security Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.