Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics
Summary
A new framework redefines how Large Language Model (LLM) hallucination detectors are evaluated, shifting focus from token-level classification accuracy (AUC) to reaction time, specifically the delay between hallucination onset and alarm. This work formulates hallucination onset detection as a quickest change detection problem, validating a first-order Markov model for the latent faithful/hallucinated state on RAGTruth data. It establishes Lorden's lower bound on detection delay at approximately 1.3 tokens for a 0.01 false-alarm rate. A causal recurrent labeler, acting as a learned CUSUM, achieves detection in 11–13 tokens, significantly outperforming a linear per-token baseline (31 tokens). The remaining order-of-magnitude gap to the theoretical bound is attributed to the learned score realizing only 1/4.5 of the features' information divergence, a deficit recalibration cannot remove.
Key takeaway
For ML engineers developing LLM hallucination monitors, you should shift your evaluation metrics from token-level AUC to sequential analysis, focusing on expected detection delay (EDD) and average run length to false alarm (ARL0). This reveals that current detectors are significantly slower than theoretical limits, primarily due to the information rate realized by the learned score, not architectural depth. Prioritize robust feature engineering and learned CUSUM approaches to minimize detection latency in streaming LLM outputs.
Key insights
Hallucination detection is a sequential problem, requiring quickest change detection metrics over traditional classification scores.
Principles
- Token-level hallucination onset follows a first-order Markov chain.
- Detection delay is bounded by Kullback–Leibler divergence between feature laws.
- Causal recurrent labelers can act as learned CUSUMs for optimal sequential detection.
Method
Formulate hallucination onset as a quickest change detection problem, using a causal recurrent labeler as a learned CUSUM to minimize expected detection delay (EDD) at a fixed average run length to false alarm (ARL0).
In practice
- Evaluate LLM monitors by detection delay, not just AUC.
- Prioritize feature discriminability to reduce detection latency.
- Implement recurrent networks for improved sequential scoring.
Topics
- Quickest Change Detection
- LLM Hallucination Detection
- CUSUM Algorithms
- Sequential Analysis
- Kullback–Leibler Divergence
- Recurrent Neural Networks
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.