Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs
Summary
A study across nine large language models (LLMs) and 108,000 generated references investigates the phenomenon of fictitious citations, finding that author names are hallucinated significantly more often than other citation fields. The research indicates that citation style has no measurable impact, and reasoning-oriented distillation degrades recall. Probes trained on one field do not generalize to others, suggesting field-specific hallucination signals. Applying elastic-net regularization with stability selection to Qwen2.5-32B-Instruct identified a sparse set of "field-specific hallucination neurons" (FH-neurons). Causal intervention confirmed these neurons' role: amplifying them increased hallucination, while suppressing them improved performance across fields, with greater gains in specific areas. This work proposes a lightweight method for detecting and mitigating citation hallucination using only internal model signals.
Key takeaway
For AI Engineers developing or deploying LLMs, understanding that citation hallucination is field-specific, particularly for author names, is critical. You should consider implementing internal model signal analysis, such as identifying and suppressing FH-neurons, to improve citation accuracy and reliability in your applications, especially when factual integrity is paramount.
Key insights
LLMs hallucinate citation fields independently, with author names being the most frequent error source.
Principles
- Hallucination signals are field-specific.
- Reasoning distillation degrades recall.
Method
Elastic-net regularization with stability selection identifies field-specific hallucination neurons (FH-neurons) in LLMs, which can then be causally intervened upon to mitigate errors.
In practice
- Suppress FH-neurons to reduce citation hallucination.
- Focus on author name generation in LLM fine-tuning.
Topics
- LLM Hallucination
- Citation Falsification
- Field-Specific Hallucination Neurons
- Causal Intervention
- Qwen2.5-32B-Instruct
Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.