Researchers just proved that LLMs can deanonymize pseudonymous users at scale with off-the-shelf tools and a sandwich budget. Here’s what that actually means.

Think about the last time you posted something online under a username that wasn’t your real name. Maybe a Reddit account where you discuss health issues. A Hacker News profile where you ask professional questions you wouldn’t want tied to your LinkedIn. A forum where you talk about things that are genuinely private.
You probably assumed that without your real name attached, those posts were reasonably safe. That connecting them back to you would require serious effort, resources, or technical expertise. That practical obscurity was doing the job that explicit privacy couldn’t.
A paper published in February 2026 by researchers at ETH Zurich, with input from a researcher at Anthropic, argues that assumption is now broken. Not weakened. Broken.
What They Built and What It Does
The research team built an automated pipeline that takes a pseudonymous online profile and tries to find the real person behind it. No human investigators. No privileged access to platform data. Just publicly available posts, off-the-shelf language models, and standard APIs.
The pipeline works in four steps. First, it reads everything a pseudonymous user has written and extracts identity signals: where they seem to be located, what they do professionally, how they write, what they care about, incidental details they’ve dropped in passing. Second, it converts this profile into a searchable embedding and scans millions of candidate identities to find the closest matches. Third, it uses a more capable LLM to reason over the top candidates, cross-referencing evidence and eliminating false leads. Fourth, it produces a confidence score so the attacker can decide when to commit to a guess.
The researchers tested it against three real datasets: Hacker News users linked to their LinkedIn profiles, Reddit users matched across different communities, and a dataset of scientists from Anthropic’s own published interview transcripts who had been partially anonymized.
The results are uncomfortable to read.
The Numbers You Should Sit With for a Moment
On the Hacker News to LinkedIn task, the classical best-practice attack (the approach security researchers considered state-of-the-art before LLMs) achieved a recall of roughly 0.1% at 99% precision. Meaning: for every 1,000 matchable users, it correctly identified one, while almost never being wrong when it did guess.
The LLM pipeline achieved 45.1% at 99% precision. Same standard for being right. Four hundred and fifty times more people identified.
On the Anthropic Interviewer dataset, the system correctly identified 9 out of 33 scientists whose transcripts had been partially scrubbed of identifying information. These were people who participated in academic research and were told their identities would be protected. The LLM found them anyway, without task decomposition tricks or special access, 50% more effectively than prior work on the same dataset.
The cost per profile: $1 to $4. The total cost of the experiments: under $2,000.
Let that ratio land. A motivated actor with a modest budget and no special skills can now deanonymize tens of thousands of people. What previously required a skilled investigator spending hours on a single target can now be automated across an entire platform.
Why LLMs Changed the Math
Before language models, deanonymization required structured data. The classic example is the Netflix Prize attack from 2008: researchers matched anonymous movie ratings to public IMDb profiles using statistical techniques. It worked, but only because the data was numerical and comparable. Unstructured text, the kind that makes up forum posts, comments, and discussions, was largely immune to these approaches. There was no good way to turn “someone’s Reddit comments” into a queryable signal.
LLMs changed that completely. They can read a hundred comments a person has made across different topics, extract the implicit signals (the way they describe their job without naming it, the city references that narrow down location, the vocabulary that suggests a particular background), and turn all of it into a structured profile. That profile can then be compared against millions of others at scale.
The researchers make a point worth highlighting: LLMs don’t succeed because they’re smarter than human investigators at identifying people. They succeed because they’re astronomically cheaper. The asymmetry between attack cost and defense cost has fundamentally shifted. A skilled human investigator spending two hours per person is expensive and slow. An LLM doing the same work in two minutes for three dollars is neither.
Who Can Actually Do This and to Whom
The paper is careful about methodology. The researchers did not run this attack on truly anonymous accounts. They built their evaluation datasets using profiles where ground truth was already established (users who had publicly linked their own accounts), stripped the obvious identifiers, and then tested whether the system could recover the link. This is the ethical way to do this research.
But the capabilities they demonstrate use only publicly available models and standard APIs. Any moderately resourced actor can replicate what they built. The code is not published, the exact prompts are not disclosed, but the architecture is described in enough detail that this is not a significant barrier.
The paper explicitly discusses who the relevant threat actors are: governments wanting to surveil dissidents, journalists, or activists. Corporations wanting to connect “anonymous” forum complaints or discussions to customer profiles for hyper-targeted advertising or to track employee sentiment. Stalkers building comprehensive profiles of targets. Hostile groups identifying opponents and building social engineering approaches tailored to them.
None of these are hypothetical. All of them now have a tool that costs pocket change per target.
One detail that stuck with me: the attack still achieves roughly 9% recall at 90% precision even in a setting where only 1 in 10,000 users has a matching candidate in the database. That is not a trivial number when you are operating at internet scale. 9% of a million people is 90,000 correctly identified individuals.
The Defense Problem Is Real
The paper discusses mitigations, and it is honest about their limitations.
API rate limiting could slow automated scraping. Platform policies could restrict bulk data exports. LLM providers could try to detect deanonymization attempts in model usage patterns. Better anonymization frameworks could help, though the paper notes that even sophisticated text sanitization leaves enough semantic signal for re-identification.
The authors offer one conclusion that is hard to argue with: “not revealing any data on online platforms is difficult, as the data we use is the very content that makes online communities valuable.”
That is the trap. The things that make you identifiable are often the things that make your contributions worth anything. If you discuss your industry from experience, mention where you live when it’s relevant, reference things that happened in your life, your posts become useful to the community and useful to anyone trying to find out who you are. Sanitizing all of that away means contributing nothing.
There is no easy fix being offered here. The researchers propose platform-level and policy-level responses, but are candid that preventing LLM-based deanonymization while preserving what makes communities worthwhile is an unsolved problem.
What This Actually Means for How You Think About Online Privacy
The paper ends with a sentence that reads almost like a warning label: “the privacy assumptions underlying much of today’s internet no longer hold.”
For most people, this means a few practical things are worth considering.
Persistent pseudonyms accumulate risk over time. Every post you make under a username adds a data point. A single comment about living in a mid-size European city is not identifying. A hundred comments over three years that mention your field, your commute time, a health condition, your partner’s job, the software tools you use at work, and the neighborhood café you frequent, collectively are. The researchers found that users who shared more content across more topics were substantially easier to identify, and that the risk increases with each post.
Posting under different usernames per topic offers more protection than a single pseudonymous identity, but the paper shows that even split profiles can be linked if the underlying user maintains consistent writing patterns, interests, or circumstances across them.
The threat is not symmetric across populations. For most people reading this, the practical risk of being targeted by a deanonymization campaign is low. The cost being low does not mean someone will spend it on you. The risk profile is very different for activists, journalists, abuse survivors, whistleblowers, or anyone posting under a pseudonym specifically because identification would have serious consequences.
Conclusion: Practical Obscurity Was Always Fragile. Now It’s Gone.
Privacy researchers have known for decades that anonymity on the internet is harder than it looks. The Netflix Prize attack was a wake-up call in 2008. Various browser fingerprinting and behavioral tracking revelations followed. Each time, the response from the tech community was roughly “yes, but targeted attacks are expensive, so most people are fine.”
That caveat is gone now. The cost floor for a precise, scalable, unstructured-text-based identity matching attack has dropped to somewhere between a coffee and a sandwich per person.
What the ETH Zurich team has done is not invent a new threat. Deanonymization was always theoretically possible. What they have done is put a price tag on it that makes it accessible to almost anyone, and demonstrated that it works reliably on real platforms with real data.
The honest takeaway is not to panic, but to update your mental model. Pseudonymity was never anonymity. It was inconvenience. That inconvenience just got a lot cheaper to overcome.