Large-scale online deanonymization with LLMs (Simon Lermen, Daniel Paleka et al.,
Summary
A recent paper by Simon Lermen, Daniel Paleka et al. (arXiv:2602.16800, 2026) demonstrates that Large Language Models (LLMs) pose a significant new threat to online pseudonymous privacy by enabling large-scale, cost-effective deanonymization. Traditionally, identifying individuals from unstructured text was difficult, but LLMs automate this process, challenging the assumption of "practical obscurity." The research validates this through multiple experiments: open-world autonomous agent attacks achieved 67% recall at 90% precision on Hacker News accounts for \$1–\$4 per person. A closed-world ESRC (Extract, Search, Reason, Calibrate) framework, matching HN to LinkedIn across an ~89k candidate pool, reached 55.2% recall at 90% precision, significantly outperforming baselines. This framework also showed robustness at 100M scale, estimating 45% recall at 90% precision with 1M candidates, and achieved up to 68% recall at 90% precision in Reddit cross-community matching. These findings indicate LLMs can process unstructured data at scale, fundamentally altering online privacy threat models.
Key takeaway
For platform developers and privacy engineers designing online services, you must urgently rethink pseudonymous protection mechanisms. Your current assumptions about "practical obscurity" are obsolete, as LLMs enable automated, cost-effective deanonymization at scale. You should implement stricter de-identification measures for unstructured text data and educate users on the increased risks of sharing personal details, even pseudonymously. Consider platform restrictions on data export to mitigate these new vulnerabilities.
Key insights
LLMs automate and scale online deanonymization, rendering traditional pseudonymous privacy ineffective against low-cost, large-scale attacks.
Principles
- LLMs reduce investigation costs significantly.
- Context integration (Reason step) reduces false positives.
- "Practical obscurity" is no longer valid.
Method
The ESRC framework extracts structured micro-data, uses semantic embeddings for candidate search, performs deep reasoning for verification, and calibrates confidence scores.
In practice
- Use LLM agents for open-world identity resolution.
- Combine fast and powerful LLMs for cost optimization.
- Re-evaluate data release policies for unstructured text.
Topics
- Large Language Models
- Online Deanonymization
- Privacy Threat Models
- ESRC Framework
- Pseudonymous Accounts
- Automated Attacks
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.