Structure Before Collapse: Transient semantic geometry in next-token prediction
Summary
The paper "Structure Before Collapse: Transient semantic geometry in next-token prediction" investigates a paradox in language models (LMs) where they learn latent semantic structure despite being trained predominantly with one-hot labels. This training regime, according to Neural Collapse theory, should push representations to be symmetrically separated, ignoring semantic similarities. Using three synthetic controlled settings, researchers found that semantic geometry, where representations cluster by shared attributes, emerges early in training without explicit supervision. However, this structure is transient; with sufficient model capacity and training time, the representations eventually collapse to the predicted symmetric state. The study employs Gram matrix analysis to examine this phase transition and proposes a preliminary modification to the unconstrained features model to better capture the emergent semantic geometry.
Key takeaway
For AI Scientists and NLP Engineers designing or fine-tuning language models, understanding the transient nature of semantic geometry is crucial. Your model's capacity and training duration directly influence whether it retains rich, clustered semantic representations or collapses to a symmetric, less semantically nuanced state. Consider exploring architectural modifications, like the proposed unconstrained features model adjustment, or specific training strategies to preserve valuable emergent semantic structures for improved model performance and interpretability.
Key insights
Language models transiently learn semantic structure from one-hot labels before representations collapse to a symmetric, non-semantic state.
Principles
- Neural Collapse predicts symmetric representations from one-hot labels.
- Semantic geometry emerges without explicit supervision.
- Latent semantic structure is transient in LMs.
Method
Investigated semantic structure emergence and collapse using three synthetic controlled settings with latent semantic factors and distinct one-hot labels, analyzed via Gram matrix.
Topics
- Language Models
- Next-Token Prediction
- Neural Collapse
- Semantic Geometry
- Gram Matrix Analysis
- Latent Structure
Best for: Research Scientist, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.