Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs
Summary
A study across nine large language models (LLMs) and 108,000 generated references reveals that LLMs frequently produce fictitious citations, with author names being the most error-prone field across all models and settings. Citation style has no measurable effect on hallucination rates, while reasoning-oriented distillation degrades recall. Probes trained on one bibliographic field (e.g., title) transfer at near-chance levels to others (e.g., authors), indicating that hallucination signals are field-specific rather than generalized. Researchers applied elastic-net regularization with stability selection to neuron-level CETT values in Qwen2.5-32B-Instruct, identifying a sparse set of field-specific hallucination neurons (FH-neurons). Causal intervention confirmed their role: amplifying these neurons increased hallucination, while suppressing them improved accuracy, particularly for title and author fields, suggesting a lightweight approach to detecting and mitigating citation hallucination using internal model signals.
Key takeaway
For AI Engineers and Research Scientists developing or deploying LLMs for academic tasks, this research highlights that citation hallucination is not a monolithic problem but rather field-specific, with author names being particularly problematic. You should implement targeted post-hoc verification for author fields and consider integrating neuron-level interventions to suppress identified FH-neurons, especially for critical applications like drafting related work or bibliographies, to enhance factual accuracy without relying solely on external retrieval.
Key insights
LLM citation hallucination is field-specific, with distinct neural mechanisms for different bibliographic components.
Principles
- Author names are consistently the most error-prone field in LLM-generated citations.
- Citation style has negligible impact on hallucination rates.
- Reasoning-oriented distillation can degrade factual recall in LLMs.
Method
The study used a two-stage verification pipeline with OpenAlex and GPT-5.4-mini to label 108,000 generated references, followed by linear probing and elastic-net regularization to identify field-specific hallucination neurons.
In practice
- Prioritize verification of author names in LLM-generated bibliographies.
- Consider fine-tuning LLMs to suppress identified FH-neurons for improved citation accuracy.
- Avoid reasoning-oriented distillation if factual recall is a critical requirement.
Topics
- Citation Hallucination
- Field-Specific Hallucination Neurons
- LLM Interpretability
- Neuron-Level Intervention
- Qwen2.5-32B-Instruct
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.