What do Language Models Learn and When? The Implicit Curriculum Hypothesis
Summary
Research introduces the Implicit Curriculum Hypothesis, proposing that large language model pretraining follows a compositional and predictable skill acquisition order across different models and data. To test this, a suite of simple, composable tasks covering retrieval, morphological transformations, coreference, logical reasoning, and mathematics was designed. Tracking skill emergence points across four model families ranging from 410M to 13B parameters revealed a striking consistency in emergence orderings, with a Pearson correlation coefficient of $ρ= .81$ across 45 model pairs. Composite tasks consistently emerged after their component tasks. The study also found that this structural emergence is encoded in model representations, where tasks with similar function vector representations exhibit similar training trajectories. This allows for effective prediction of training trajectories for simple held-out compositional tasks, achieving an $R^2$ of $.68$-".84" across models.
Key takeaway
For research scientists developing or evaluating large language models, understanding the Implicit Curriculum Hypothesis is crucial. Your model's pretraining likely follows a predictable, compositional skill acquisition path, meaning you can anticipate which capabilities will emerge and in what order. This insight allows you to design more efficient training regimes and interpret model behavior by analyzing internal representations to predict learning trajectories, potentially reducing extensive evaluation cycles.
Key insights
LLM pretraining follows a consistent, compositional skill emergence order, predictable from internal representations.
Principles
- Skill emergence is compositional.
- Emergence order is consistent across models.
- Model representations encode skill trajectories.
Method
A task suite tracks skill emergence across model families, correlating emergence order with representation similarity to predict training trajectories.
In practice
- Design curricula for specific skills.
- Analyze model representations for skill encoding.
- Predict task learning without full evaluation.
Topics
- Large Language Models
- LLM Pretraining
- Implicit Curriculum Hypothesis
- Skill Emergence
- Model Representations
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.