Google is cooking: Beyond the 'Next-Token' Manifold
Summary
Google DeepMind, in collaboration with Princeton University, published a study on January 29, 2026, revealing that transformer models leverage geometric principles to process language. Building on a 2023 MIT study, Google's research indicates that during natural language processing or simple grid walk structures, transformer middle layers (15-25) extensively straighten neural sentence trajectories, actively untangling sequences into linear path representations. This linearity, crucial for next-token prediction via linear extrapolation, is observed in the hidden states that produce logits. However, the study also found that this linearity collapses during reasoning tasks, such as few-shot Q&A, where the model performs nonlinear jumps or "manifold hops" to higher dimensions. This suggests current LLMs struggle with long-term reasoning and planning, often getting trapped in local minima.
Key takeaway
For research scientists developing next-generation LLMs, understanding the geometric underpinnings of transformer behavior is critical. Your models linearize for coherent flow but require nonlinear mechanisms for true reasoning, which current architectures struggle with for long-term planning. You should focus on designing new transformer architectures that can evaluate future states, not just predict the next token, to overcome inherent limitations in complex decision-making.
Key insights
Transformer models linearize internal representations for flow, but use nonlinear jumps for reasoning.
Principles
- Linearity facilitates simpler next-token prediction.
- Residual streams preserve information additively.
- Reasoning tasks require nonlinear manifold operations.
Method
The study computed sentence curvature as the average of angles between vectors connecting adjacent words in each layer to quantify trajectory straightening.
In practice
- Examine hidden states for linearity in sequence tasks.
- Identify nonlinear jumps for complex reasoning.
- Design new architectures for long-term planning.
Topics
- Neural Sentence Trajectory
- Transformer Architecture
- In-Context Learning
- Geometric Representation
- LLM Reasoning Limitations
Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.