TPA: Next Token Probability Attribution for Detecting Hallucinations in RAG

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Data Science & Analytics · Depth: Expert, extended

Summary

SPAD (Seven-Source Token Probability Attribution with Syntactic Aggregation) is a novel framework designed to detect hallucinations in Retrieval-Augmented Generation (RAG) systems by providing a comprehensive, mechanistic view of token generation. Unlike prior methods that focus on a binary conflict between internal FFN knowledge and retrieved context, SPAD mathematically attributes each token's probability to seven distinct sources: Query, RAG, Past, Current Token, FFN, Final LayerNorm, and Initial Embedding. These attribution scores are then aggregated by Part-of-Speech (POS) tags to identify anomalous linguistic patterns, such as Nouns relying heavily on Final LayerNorm, which signal hallucinations. The framework, which uses an XGBoost classifier on a 126-dimensional feature vector, has demonstrated state-of-the-art performance on benchmarks like RAGTruth and Dolly, outperforming existing baselines across Llama2-7B, Llama2-13B, and Llama3-8B models. SPAD also offers transparent interpretability, revealing that hallucination signals vary across model architectures and highlighting the often-overlooked role of the user query.

Key takeaway

For AI Engineers and Research Scientists developing or deploying RAG systems, SPAD offers a more robust and interpretable approach to hallucination detection. By analyzing the seven distinct sources of token probability and their syntactic context, you can move beyond proxy signals to pinpoint the architectural causes of factual errors. This enables more precise debugging and the potential for real-time mitigation strategies, improving the reliability of your LLM applications.

Key insights

SPAD attributes token probabilities to seven sources and aggregates by POS tags to detect RAG hallucinations.

Principles

Method

SPAD decomposes token probabilities into seven sources, then aggregates these attributions by Part-of-Speech tags to create a feature vector for an XGBoost classifier, identifying anomalies indicative of hallucinations.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.