Quickest Detection of Hallucination Onset: Delay Bounds and Learned CUSUM Statistics

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

A new framework redefines how Large Language Model (LLM) hallucination detectors are evaluated, shifting focus from token-level classification accuracy (AUC) to reaction time, specifically the delay between hallucination onset and alarm. This work formulates hallucination onset detection as a quickest change detection problem, validating a first-order Markov model for the latent faithful/hallucinated state on RAGTruth data. It establishes Lorden's lower bound on detection delay at approximately 1.3 tokens for a 0.01 false-alarm rate. A causal recurrent labeler, acting as a learned CUSUM, achieves detection in 11–13 tokens, significantly outperforming a linear per-token baseline (31 tokens). The remaining order-of-magnitude gap to the theoretical bound is attributed to the learned score realizing only 1/4.5 of the features' information divergence, a deficit recalibration cannot remove.

Key takeaway

For ML engineers developing LLM hallucination monitors, you should shift your evaluation metrics from token-level AUC to sequential analysis, focusing on expected detection delay (EDD) and average run length to false alarm (ARL0). This reveals that current detectors are significantly slower than theoretical limits, primarily due to the information rate realized by the learned score, not architectural depth. Prioritize robust feature engineering and learned CUSUM approaches to minimize detection latency in streaming LLM outputs.

Key insights

Hallucination detection is a sequential problem, requiring quickest change detection metrics over traditional classification scores.

Principles

Token-level hallucination onset follows a first-order Markov chain.
Detection delay is bounded by Kullback–Leibler divergence between feature laws.
Causal recurrent labelers can act as learned CUSUMs for optimal sequential detection.

Method

Formulate hallucination onset as a quickest change detection problem, using a causal recurrent labeler as a learned CUSUM to minimize expected detection delay (EDD) at a fixed average run length to false alarm (ARL0).

In practice

Evaluate LLM monitors by detection delay, not just AUC.
Prioritize feature discriminability to reduce detection latency.
Implement recurrent networks for improved sequential scoring.

Topics

Quickest Change Detection
LLM Hallucination Detection
CUSUM Algorithms
Sequential Analysis
Kullback–Leibler Divergence
Recurrent Neural Networks

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.