Google is cooking: Beyond the 'Next-Token' Manifold

2026-02-03 · Source: Discover AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Advanced, long

Summary

Google DeepMind, in collaboration with Princeton University, published a study on January 29, 2026, revealing that transformer models leverage geometric principles to process language. Building on a 2023 MIT study, Google's research indicates that during natural language processing or simple grid walk structures, transformer middle layers (15-25) extensively straighten neural sentence trajectories, actively untangling sequences into linear path representations. This linearity, crucial for next-token prediction via linear extrapolation, is observed in the hidden states that produce logits. However, the study also found that this linearity collapses during reasoning tasks, such as few-shot Q&A, where the model performs nonlinear jumps or "manifold hops" to higher dimensions. This suggests current LLMs struggle with long-term reasoning and planning, often getting trapped in local minima.

Key takeaway

For research scientists developing next-generation LLMs, understanding the geometric underpinnings of transformer behavior is critical. Your models linearize for coherent flow but require nonlinear mechanisms for true reasoning, which current architectures struggle with for long-term planning. You should focus on designing new transformer architectures that can evaluate future states, not just predict the next token, to overcome inherent limitations in complex decision-making.

Key insights

Transformer models linearize internal representations for flow, but use nonlinear jumps for reasoning.

Principles

Linearity facilitates simpler next-token prediction.
Residual streams preserve information additively.
Reasoning tasks require nonlinear manifold operations.

Method

The study computed sentence curvature as the average of angles between vectors connecting adjacent words in each layer to quantify trajectory straightening.

In practice

Examine hidden states for linearity in sequence tasks.
Identify nonlinear jumps for complex reasoning.
Design new architectures for long-term planning.

Topics

Neural Sentence Trajectory
Transformer Architecture
In-Context Learning
Geometric Representation
LLM Reasoning Limitations

Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Discover AI.