What do Language Models Learn and When? The Implicit Curriculum Hypothesis

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Research introduces the Implicit Curriculum Hypothesis, proposing that large language model pretraining follows a compositional and predictable skill acquisition order across different models and data. To test this, a suite of simple, composable tasks covering retrieval, morphological transformations, coreference, logical reasoning, and mathematics was designed. Tracking skill emergence points across four model families ranging from 410M to 13B parameters revealed a striking consistency in emergence orderings, with a Pearson correlation coefficient of $ρ= .81$ across 45 model pairs. Composite tasks consistently emerged after their component tasks. The study also found that this structural emergence is encoded in model representations, where tasks with similar function vector representations exhibit similar training trajectories. This allows for effective prediction of training trajectories for simple held-out compositional tasks, achieving an $R^2$ of $.68$-".84" across models.

Key takeaway

For research scientists developing or evaluating large language models, understanding the Implicit Curriculum Hypothesis is crucial. Your model's pretraining likely follows a predictable, compositional skill acquisition path, meaning you can anticipate which capabilities will emerge and in what order. This insight allows you to design more efficient training regimes and interpret model behavior by analyzing internal representations to predict learning trajectories, potentially reducing extensive evaluation cycles.

Key insights

LLM pretraining follows a consistent, compositional skill emergence order, predictable from internal representations.

Principles

Method

A task suite tracks skill emergence across model families, correlating emergence order with representation similarity to predict training trajectories.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.