Large-scale online deanonymization with LLMs (Simon Lermen, Daniel Paleka et al.,

2026-06-20 · Source: LLM on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, medium

Summary

A recent paper by Simon Lermen, Daniel Paleka et al. (arXiv:2602.16800, 2026) demonstrates that Large Language Models (LLMs) pose a significant new threat to online pseudonymous privacy by enabling large-scale, cost-effective deanonymization. Traditionally, identifying individuals from unstructured text was difficult, but LLMs automate this process, challenging the assumption of "practical obscurity." The research validates this through multiple experiments: open-world autonomous agent attacks achieved 67% recall at 90% precision on Hacker News accounts for \$1–\$4 per person. A closed-world ESRC (Extract, Search, Reason, Calibrate) framework, matching HN to LinkedIn across an ~89k candidate pool, reached 55.2% recall at 90% precision, significantly outperforming baselines. This framework also showed robustness at 100M scale, estimating 45% recall at 90% precision with 1M candidates, and achieved up to 68% recall at 90% precision in Reddit cross-community matching. These findings indicate LLMs can process unstructured data at scale, fundamentally altering online privacy threat models.

Key takeaway

For platform developers and privacy engineers designing online services, you must urgently rethink pseudonymous protection mechanisms. Your current assumptions about "practical obscurity" are obsolete, as LLMs enable automated, cost-effective deanonymization at scale. You should implement stricter de-identification measures for unstructured text data and educate users on the increased risks of sharing personal details, even pseudonymously. Consider platform restrictions on data export to mitigate these new vulnerabilities.

Key insights

LLMs automate and scale online deanonymization, rendering traditional pseudonymous privacy ineffective against low-cost, large-scale attacks.

Principles

LLMs reduce investigation costs significantly.
Context integration (Reason step) reduces false positives.
"Practical obscurity" is no longer valid.

Method

The ESRC framework extracts structured micro-data, uses semantic embeddings for candidate search, performs deep reasoning for verification, and calibrates confidence scores.

In practice

Use LLM agents for open-world identity resolution.
Combine fast and powerful LLMs for cost optimization.
Re-evaluate data release policies for unstructured text.

Topics

Large Language Models
Online Deanonymization
Privacy Threat Models
ESRC Framework
Pseudonymous Accounts
Automated Attacks

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, AI Security Engineer, Policy Maker

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.