Structure Before Collapse: Transient semantic geometry in next-token prediction

2026-06-25 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

The paper "Structure Before Collapse: Transient semantic geometry in next-token prediction" by Yize Zhao, Isabel Papadimitriou, and Christos Thrampoulidis investigates a paradox in language models. Despite being trained predominantly with one-hot labels, which Neural Collapse theory predicts should lead to symmetric, semantically undifferentiated representations, these models clearly learn latent structural features. Through three synthetic controlled settings, the authors demonstrate that semantic geometry emerges early in training, causing representations to cluster by shared attributes even without explicit supervision. This emergent structure is transient; with sufficient capacity and training time, the model eventually converges to the predicted symmetric state. The study employs Gram matrix analysis to examine this phase transition and proposes a preliminary modification to the commonly used unconstrained features model to better capture the emergent semantic geometry.

Key takeaway

For AI Scientists optimizing language model training, recognize that valuable semantic structure emerges early but is transient. Your models may learn rich semantic geometry initially, only for it to collapse into symmetric, less semantically useful representations with prolonged training. Consider strategies to capture or stabilize this emergent geometry, perhaps by adjusting training duration, implementing regularization, or exploring architectural modifications to preserve these critical latent features.

Key insights

Semantic structure in next-token prediction LMs emerges transiently despite one-hot training, before collapsing to symmetric representations.

Principles

Neural Collapse predicts symmetric representations in one-hot classification.
Semantic geometry can emerge without explicit supervision.
Early training phases can exhibit transient, structured representations.

Method

Investigated semantic structure emergence using synthetic settings and Gram matrix analysis. Proposed a preliminary modification to the unconstrained features model.

In practice

Analyze representation dynamics with Gram matrix.
Consider early training phases for semantic structure.
Explore model modifications to preserve emergent geometry.

Topics

Next-token prediction
Neural Collapse
Semantic Geometry
Language Models
Representation Learning
Gradient Descent Dynamics

Code references

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.