Hallucination as Trajectory Commitment: Causal Evidence for Asymmetric Attractor Dynamics in Transformer Generation

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

Research on the Qwen2.5-1.5B language model provides causal evidence that hallucination is an "early trajectory commitment" governed by asymmetric attractor dynamics. Using a "same-prompt bifurcation" method, 27 out of 61 prompts (44.3%) spontaneously diverged into factual and hallucinated trajectories at the first generated token, with KL divergence increasing sharply. Activation patching revealed a significant causal asymmetry: injecting a hallucinated activation into a correct trajectory corrupted output in 87.5% of trials (layer 20), while the reverse (correcting a hallucinated trajectory) succeeded in only 33.3% (layer 24). This 2.6x gap indicates that hallucination acts as a locally stable attractor basin, easy to enter but difficult to escape, requiring sustained multi-step intervention for correction. Furthermore, step-0 residual states predict per-prompt hallucination rates (Pearson r=0.776 at layer 15), and unsupervised clustering identifies five regime-like groups, with a "saddle cluster" concentrating 12 of 13 bifurcating false-premise prompts.

Key takeaway

For AI Engineers and Research Scientists developing or deploying LLMs, this research indicates that hallucination is a deeply embedded dynamic, not merely a knowledge retrieval failure. You should prioritize early-stage intervention strategies, potentially at the prompt encoding level, to steer models away from hallucination-prone regimes. Be aware that single-point corrections are largely ineffective; robust solutions will require coordinated, multi-step interventions to escape established hallucination attractors.

Key insights

Hallucination in LLMs is an early, asymmetric attractor phenomenon, easy to trigger but hard to correct.

Principles

Method

Same-prompt bifurcation isolates trajectory dynamics. Symmetric causal patching measures directional asymmetry. Unsupervised clustering of step-0 residual states identifies hallucination risk regimes.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.