How Language Models Fail: Token-Level Signatures of Committed and Persistent Reasoning Failures

2026-06-08 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A new framework characterizes language model reasoning failures using token-level uncertainty signals, identifying two distinct processes: "committed failure" and "persistent uncertainty." Committed failure occurs when a model locks onto an incorrect reasoning path early, marked by a "commitment point" where uncertainty signals are maximally predictive. Conversely, persistent uncertainty involves uncertainty accumulating throughout the trace, requiring the full sequence for detection. This framework was empirically validated across 23 model-dataset configurations, spanning five model families and four reasoning domains, with its falsifiable predictions holding in 20 of 23 cases. The research also demonstrates that this failure mode classification has direct implications for self-consistency, indicating when uncertainty signals complement it and when self-consistency can be selectively skipped.

Key takeaway

For Machine Learning Engineers deploying LLMs for complex reasoning tasks, you should characterize failure modes (committed vs. persistent) using token-level uncertainty signals from single completions. This allows you to adapt detection strategies, potentially skipping self-consistency for committed failures or combining it with uncertainty features for persistent ones, improving reliability and efficiency in your deployments.

Key insights

LLM reasoning failures manifest as "committed" (early lock-in) or "persistent" (accumulating uncertainty), detectable via token-level signals.

Principles

Failure processes leave identifiable token-level signatures in reasoning traces.
A single failure detection strategy is not optimal across all failure types.
Early commitment to a reasoning path restricts the effectiveness of corrections.

Method

Characterize LLM reasoning failures by analyzing token-level uncertainty signals (Entropy, Margin, NLL, Nucleus, Near-Tie) over prefixes of chain-of-thought traces, using PR-AUC to identify "committed" or "persistent" modes.

In practice

Adapt failure detection strategies based on the identified failure mode.
Combine single-completion uncertainty features with self-consistency for improved prediction.
Selectively skip self-consistency on confident inputs in committed failure regimes.

Topics

Language Model Failures
Token-Level Uncertainty
Chain-of-Thought Reasoning
Self-Consistency
Failure Detection
LLM Evaluation

Code references

sisl/LMTwoFailureModeFramework

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.