Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

2025-01-30 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

A study across nine large language models (LLMs) and 108,000 generated references reveals that LLMs frequently produce fictitious citations, with author names being the most error-prone field across all models and settings. Citation style has no measurable effect on hallucination rates, while reasoning-oriented distillation degrades recall. Probes trained on one bibliographic field (e.g., title) transfer at near-chance levels to others (e.g., authors), indicating that hallucination signals are field-specific rather than generalized. Researchers applied elastic-net regularization with stability selection to neuron-level CETT values in Qwen2.5-32B-Instruct, identifying a sparse set of field-specific hallucination neurons (FH-neurons). Causal intervention confirmed their role: amplifying these neurons increased hallucination, while suppressing them improved accuracy, particularly for title and author fields, suggesting a lightweight approach to detecting and mitigating citation hallucination using internal model signals.

Key takeaway

For AI Engineers and Research Scientists developing or deploying LLMs for academic tasks, this research highlights that citation hallucination is not a monolithic problem but rather field-specific, with author names being particularly problematic. You should implement targeted post-hoc verification for author fields and consider integrating neuron-level interventions to suppress identified FH-neurons, especially for critical applications like drafting related work or bibliographies, to enhance factual accuracy without relying solely on external retrieval.

Key insights

LLM citation hallucination is field-specific, with distinct neural mechanisms for different bibliographic components.

Principles

Author names are consistently the most error-prone field in LLM-generated citations.
Citation style has negligible impact on hallucination rates.
Reasoning-oriented distillation can degrade factual recall in LLMs.

Method

The study used a two-stage verification pipeline with OpenAlex and GPT-5.4-mini to label 108,000 generated references, followed by linear probing and elastic-net regularization to identify field-specific hallucination neurons.

In practice

Prioritize verification of author names in LLM-generated bibliographies.
Consider fine-tuning LLMs to suppress identified FH-neurons for improved citation accuracy.
Avoid reasoning-oriented distillation if factual recall is a critical requirement.

Topics

Citation Hallucination
Field-Specific Hallucination Neurons
LLM Interpretability
Neuron-Level Intervention
Qwen2.5-32B-Instruct

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.