Subject-level Inference for Realistic Text Anonymization Evaluation
Summary
The SPIA (Subject-level PII Inference Assessment) benchmark introduces a novel approach to evaluating text anonymization, shifting focus from span-based metrics to individual data subjects. This benchmark, comprising 675 documents across legal and online domains, addresses the limitations of traditional methods that fail to capture adversarial inference capabilities and multi-subject scenarios. Experiments with 4 anonymization methods and 6 LLM backbones reveal that even with over 90% PII span masking, subject-level inference protection can drop to 33%, indicating significant residual privacy risks. Furthermore, anonymization strategies focused on a single target subject often leave non-target subjects substantially more exposed, with protection gaps up to 11 percentage points. The study emphasizes that anonymization effectiveness varies significantly by document type, necessitating domain-aware evaluation.
Key takeaway
For engineering teams developing or deploying text anonymization solutions, relying solely on span-based metrics like token or entity recall is insufficient and creates a false sense of security. You should integrate subject-level inference evaluation using benchmarks like SPIA to accurately assess residual privacy risks, especially in multi-subject documents. Prioritize anonymization techniques that explicitly protect all individuals and adapt strategies based on document domain and length to ensure robust and equitable privacy safeguards.
Key insights
Subject-level inference evaluation is crucial for realistic text anonymization, as span masking alone is insufficient.
Principles
- Span-based metrics overestimate privacy protection.
- Single-subject anonymization creates protection inequality.
- Anonymization effectiveness is domain-dependent.
Method
SPIA employs a two-stage framework: identifying all data subjects within a document, then inferring 15 PII categories for each subject, using novel Individual Protection Rate (IPR) and Collective Protection Rate (CPR) metrics.
In practice
- Use SPIA for multi-subject, inference-based privacy assessment.
- Prioritize LLM-based anonymization for better contextual protection.
- Tailor anonymization strategies to document characteristics.
Topics
- Text Anonymization
- SPIA Benchmark
- Subject-level PII Inference
- Privacy Protection Metrics
- Large Language Models
Code references
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, NLP Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.