Slot Machines: How LLMs Keep Track of Multiple Entities
Summary
Large Language Models (LLMs) represent multiple entities within a context using distinct "slots" for the currently described entity and the immediately preceding one, according to a study using a novel multi-slot probing approach on models like Qwen3-32B. These "current-entity" and "prior-entity" slots are largely orthogonal and serve different functional roles. The prior-entity slot supports relational inferences, such as determining entity sequence or detecting conflicting traits between adjacent entities. However, the current-entity slot is exclusively used for explicit factual retrieval, like answering "Is anyone in the story tall?" or "What is the tall entity's name?", even though the necessary information is linearly decodable from the prior-entity slot. Non-frontier models struggle with syntax requiring two subject-verb-object bindings on a single token (e.g., "Alice prepares and Bob consumes food."), achieving near chance accuracy, while recent frontier models like Claude Opus-4.5 and Gemini-3-Pro can parse these correctly, suggesting more sophisticated binding strategies.
Key takeaway
For AI Scientists and Machine Learning Engineers developing or evaluating LLMs, recognize that information present in model activations is not always utilized for all tasks. If your application requires complex multi-entity binding on single tokens, especially for explicit retrieval, current open-source models may fail. Consider using frontier models or designing prompts that re-cast complex statements into simpler, single-binding sentences to improve reliability and performance.
Key insights
LLMs use distinct, orthogonal "current-entity" and "prior-entity" slots for entity representation, with functional limitations.
Principles
- Information availability does not guarantee model utilization.
- Entity representations are copied and transformed into distinct slots.
- Current/prior-entity slots are a substrate for multi-perspective behaviors.
Method
A multi-slot probing architecture with a mixture-of-experts and routing layers disentangles a single token's residual stream activation to recover information about current and prior entities, identifying distinct representational schemes.
In practice
- Use current-entity slots for direct factual retrieval queries.
- Leverage prior-entity slots for relational inferences like sequence or conflict detection.
- Be aware of limitations in non-frontier models for complex dual-binding syntax.
Topics
- Multi-slot Probing
- Entity Binding Mechanisms
- Current-entity Slot
- Prior-entity Slot
- Relational Inferences
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.