Slot Machines: How LLMs Keep Track of Multiple Entities

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

Large Language Models (LLMs) represent multiple entities within a context using distinct "slots" for the currently described entity and the immediately preceding one, according to a study using a novel multi-slot probing approach on models like Qwen3-32B. These "current-entity" and "prior-entity" slots are largely orthogonal and serve different functional roles. The prior-entity slot supports relational inferences, such as determining entity sequence or detecting conflicting traits between adjacent entities. However, the current-entity slot is exclusively used for explicit factual retrieval, like answering "Is anyone in the story tall?" or "What is the tall entity's name?", even though the necessary information is linearly decodable from the prior-entity slot. Non-frontier models struggle with syntax requiring two subject-verb-object bindings on a single token (e.g., "Alice prepares and Bob consumes food."), achieving near chance accuracy, while recent frontier models like Claude Opus-4.5 and Gemini-3-Pro can parse these correctly, suggesting more sophisticated binding strategies.

Key takeaway

For AI Scientists and Machine Learning Engineers developing or evaluating LLMs, recognize that information present in model activations is not always utilized for all tasks. If your application requires complex multi-entity binding on single tokens, especially for explicit retrieval, current open-source models may fail. Consider using frontier models or designing prompts that re-cast complex statements into simpler, single-binding sentences to improve reliability and performance.

Key insights

LLMs use distinct, orthogonal "current-entity" and "prior-entity" slots for entity representation, with functional limitations.

Principles

Method

A multi-slot probing architecture with a mixture-of-experts and routing layers disentangles a single token's residual stream activation to recover information about current and prior entities, identifying distinct representational schemes.

In practice

Topics

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.