Where Fake Citations Are Made: Tracing Field-Level Hallucination to Specific Neurons in LLMs

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study across nine large language models (LLMs) and 108,000 generated references investigates the phenomenon of fictitious citations, finding that author names are hallucinated significantly more often than other citation fields. The research indicates that citation style has no measurable impact, and reasoning-oriented distillation degrades recall. Probes trained on one field do not generalize to others, suggesting field-specific hallucination signals. Applying elastic-net regularization with stability selection to Qwen2.5-32B-Instruct identified a sparse set of "field-specific hallucination neurons" (FH-neurons). Causal intervention confirmed these neurons' role: amplifying them increased hallucination, while suppressing them improved performance across fields, with greater gains in specific areas. This work proposes a lightweight method for detecting and mitigating citation hallucination using only internal model signals.

Key takeaway

For AI Engineers developing or deploying LLMs, understanding that citation hallucination is field-specific, particularly for author names, is critical. You should consider implementing internal model signal analysis, such as identifying and suppressing FH-neurons, to improve citation accuracy and reliability in your applications, especially when factual integrity is paramount.

Key insights

LLMs hallucinate citation fields independently, with author names being the most frequent error source.

Principles

Method

Elastic-net regularization with stability selection identifies field-specific hallucination neurons (FH-neurons) in LLMs, which can then be causally intervened upon to mitigate errors.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.