Why Fine-Tuning Encourages Hallucinations and How to Fix It

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Large language models (LLMs) often hallucinate factually incorrect statements, a problem exacerbated by supervised fine-tuning (SFT) when models acquire new factual information. This work reinterprets SFT-induced hallucinations as "factual forgetting," a form of catastrophic forgetting from continual learning. Researchers propose two mitigation strategies: reducing factual plasticity by freezing parameter groups, which is effective when new fact acquisition is not desired, and self-distillation, which enables new factual learning while minimizing forgetting. Self-distillation reduces factual forgetting from approximately 15% to 3%. The study investigates the underlying mechanism, finding that interference among overlapping semantic representations is a primary driver, rather than capacity limitations or behavior cloning. Experiments with Qwen 2.5 (1.5B, 8B) and LLaMA 3.1 (8B) models on the EntityQuestions dataset support these findings, showing that self-distillation mitigates this interference by regularizing output-distribution drift.

Key takeaway

For AI Engineers and Research Scientists developing or fine-tuning LLMs, understanding that SFT-induced hallucinations are a form of factual forgetting due to representational interference is critical. If your goal is task adaptation without new factual knowledge, selectively freezing FFN parameters can preserve existing knowledge. When new factual acquisition is necessary, implement self-distillation to reduce forgetting from ~15% to ~3% by stabilizing output distributions, ensuring both plasticity and stability.

Key insights

SFT-induced hallucinations stem from factual forgetting due to semantic interference, mitigable by self-distillation or parameter freezing.

Principles

Method

Self-distillation regularizes fine-tuning by constraining output-distribution shifts, using a frozen teacher model's output to guide the student, thereby limiting parameter updates that degrade existing knowledge.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.