When Attention Closes: How LLMs Lose the Thread in Multi-Turn Interaction

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Large language models (LLMs) reliably follow single-turn instructions but degrade in long multi-turn interactions, losing track of initial instructions, persona, and rules. This study provides a mechanistic explanation, proposing a "channel-transition" account where goal-defining tokens become less accessible via attention, while goal-related information may persist in residual representations. Researchers introduce the Goal Accessibility Ratio (GAR) to measure attention to task-defining goal tokens and combine it with sliding-window ablations and residual-stream probes. They found that GAR monotonically declines across all tested architectures (Mistral, Qwen, LLaMA, Mixtral), and forcing attention channel closure via a sliding-window intervention causes predictable behavioral failures. Despite attention loss, some models preserve substantial goal-conditioned behavior, with linear probes recovering per-episode recall outcomes with AUC up to 0.99 from residual representations, indicating architecture-specific encoding depths.

Key takeaway

For AI engineers developing long-horizon conversational agents, your focus should extend beyond merely extending context windows. Prioritize architectural designs that maintain robust goal representations in the residual stream, even after direct attention to initial instructions fades. This ensures consistent instruction following and persona compliance in prolonged interactions, mitigating degradation and improving reliability.

Key insights

LLM multi-turn instruction-following degradation stems from attention channel closure, with residual stream encoding determining post-closure behavior.

Principles

Method

The Goal Accessibility Ratio (GAR) quantifies attention to goal tokens. Sliding-window ablations causally close the attention channel. Linear probes on residual activations measure goal information encoding capacity.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.