Subject-level Inference for Realistic Text Anonymization Evaluation

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

The SPIA (Subject-level PII Inference Assessment) benchmark introduces a novel approach to evaluating text anonymization, shifting focus from span-based metrics to individual data subjects. This benchmark, comprising 675 documents across legal and online domains, addresses the limitations of traditional methods that fail to capture adversarial inference capabilities and multi-subject scenarios. Experiments with 4 anonymization methods and 6 LLM backbones reveal that even with over 90% PII span masking, subject-level inference protection can drop to 33%, indicating significant residual privacy risks. Furthermore, anonymization strategies focused on a single target subject often leave non-target subjects substantially more exposed, with protection gaps up to 11 percentage points. The study emphasizes that anonymization effectiveness varies significantly by document type, necessitating domain-aware evaluation.

Key takeaway

For engineering teams developing or deploying text anonymization solutions, relying solely on span-based metrics like token or entity recall is insufficient and creates a false sense of security. You should integrate subject-level inference evaluation using benchmarks like SPIA to accurately assess residual privacy risks, especially in multi-subject documents. Prioritize anonymization techniques that explicitly protect all individuals and adapt strategies based on document domain and length to ensure robust and equitable privacy safeguards.

Key insights

Subject-level inference evaluation is crucial for realistic text anonymization, as span masking alone is insufficient.

Principles

Method

SPIA employs a two-stage framework: identifying all data subjects within a document, then inferring 15 PII categories for each subject, using novel Individual Protection Rate (IPR) and Collective Protection Rate (CPR) metrics.

In practice

Topics

Code references

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, NLP Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.