The Ghost Couple: Correlated LLM Name Priors and Their Haunting of the Web and Academic Publishing

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI Content Integrity · Depth: Expert, medium

Summary

A study by Michał Brzozowski and Neo Christopher Chung reveals that large language models (LLMs) generate not just high-probability individual names, but correlated character ensembles, such as "Elena Vasquez + Marcus Chen + Amara Okafor" for Claude, "Aris Thorne + Lena Petrova" for Gemini, and "Elara Voss" for GPT. These name priors are model-family and version-specific, showing consistent co-occurrence rates across independent generations. The research notes that these priors are actively suppressed at model release boundaries, leaving identifiable behavioral fingerprints. A significant downstream consequence is documented on Zenodo, a CERN-operated repository, where 1,655 "ghost-authored" records were found, claiming nonexistent journals with fabricated publication dates. DataCite timestamps confirm deliberate backdating, with 991 records registered in a single month, all carrying real DOIs. Additionally, ghost names appear on ResearchGate, forming synthetic research groups, with publication dates providing a temporal proxy for model deployment.

Key takeaway

For research scientists and academic publishers evaluating scholarly integrity, this research highlights a critical vulnerability: LLM-generated ghost authors are infiltrating repositories like Zenodo with fabricated publications. You should implement robust verification processes for new submissions, cross-referencing author names and journal details against established databases. Be wary of records with suspicious publication dates or correlated author ensembles, as these indicate potential AI-generated content that could undermine research credibility and data integrity.

Key insights

LLMs generate correlated fictional character ensembles, not just individual names, leading to widespread ghost authorship in academic repositories.

Principles

Method

The study identified correlated name priors by analyzing independent LLM generations and traced their downstream impact by scanning Zenodo and ResearchGate for ghost-authored records and fabricated publication metadata.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.