Google's Warning: ICL Context is Inert

· Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Advanced, long

Summary

A Google DeepMind, Brown University, and New York University study, published February 4, 2026, reveals that large language models (LLMs) struggle to utilize representations learned through in-context learning (ICL). Despite successfully encoding complex topologies, such as 5x5 grids, into their internal residual streams with high accuracy (up to 85% distance correlation), LLMs fail to perform adaptive world modeling tasks that require deploying these learned representations. For instance, on a 16-state 1D chain, accuracy reached 60%, but on 2D grids (4x4, 5x5), accuracy plummeted below 50% for one-step tasks and under 20% for two-step or three-step tasks. This limitation persists across various open-weight LLMs (4B to 27B parameters) and even large proprietary reasoning models like "GPT5" with extensive chain-of-thought prompting (up to 5,000 tokens), where accuracy entirely collapses to less than 10% on 2D grids. The core issue appears to be the self-attention mechanism's inability to interpret and act upon the perfectly structured internal representations, rendering them functionally inert for complex reasoning.

Key takeaway

For AI Scientists developing or deploying LLMs for tasks requiring spatial or topological reasoning, this research indicates a fundamental limitation in current ICL and self-attention mechanisms. Your models may encode complex "maps" internally, but they cannot effectively "read" or act upon them for multi-step inferences. Consider architectural innovations beyond standard self-attention or specialized training for non-linear reasoning to overcome this functional inertness, especially for applications in chemistry, finance, or physics.

Key insights

LLMs encode complex topologies internally but cannot functionally utilize these in-context learned representations for multi-step reasoning.

Principles

Method

The study used an "adaptive world modeling" task, requiring LLMs to navigate novel steps on various topologies (1D chains, 2D grids) after few-shot examples, measuring accuracy for 1, 2, and 3-step complexities.

In practice

Topics

Best for: AI Scientist, Research Scientist, AI Researcher, Machine Learning Engineer, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.