Two Pathways to Truthfulness: On the Intrinsic Encoding of LLM Hallucinations

2024-11-20 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Researchers from Peking University and Microsoft Research Asia have identified two distinct internal information pathways within large language models (LLMs) that encode truthfulness signals, crucial for detecting hallucinations. These pathways are the Question-Anchored (Q-Anchored) pathway, which relies on information flow from the input question to the answer, and the Answer-Anchored (A-Anchored) pathway, which derives self-contained evidence from the generated answer itself. The study validated these pathways using attention knockout and token patching experiments across 12 LLMs (including Llama-3.2-1B, Llama-3-70B, Mistral-7B-v0.1, Qwen3-32B) and four datasets (PopQA, TriviaQA, HotpotQA, Natural Questions). Findings indicate Q-Anchored encoding predominates for well-established facts, while A-Anchored encoding is favored for long-tail knowledge. LLMs also exhibit intrinsic awareness of these pathway distinctions. Building on these insights, two methods, Mixture-of-Probes (MoP) and Pathway Reweighting (PR), were proposed to enhance hallucination detection, achieving up to a 10% AUC gain.

Key takeaway

For AI Engineers and Research Scientists focused on improving LLM reliability, understanding these two truthfulness pathways is critical. You should consider implementing pathway-aware detection strategies like Mixture-of-Probes (MoP) or Pathway Reweighting (PR) to significantly enhance hallucination detection performance. This approach allows for more targeted interventions, especially when dealing with factual inaccuracies stemming from either question-answer dependencies or the model's internal knowledge gaps, leading to more robust generative systems.

Key insights

LLMs encode truthfulness via two distinct pathways: question-dependent and answer-self-contained.

Principles

Truthfulness encoding aligns with LLM knowledge boundaries.
LLMs possess intrinsic awareness of their truthfulness pathways.

Method

Truthfulness pathways are disentangled using attention knockout and token patching, then leveraged for enhanced hallucination detection via Mixture-of-Probes (MoP) and Pathway Reweighting (PR).

In practice

Use MoP for specialized hallucination detection.
Apply PR to amplify pathway-relevant truthfulness cues.

Topics

LLM Hallucination Detection
Intrinsic Truthfulness Encoding
Question-Anchored Pathway
Answer-Anchored Pathway
LLM Knowledge Boundaries

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.