Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A phenomenon termed "natural ungrokking" reveals that small language models can forget previously learned rules during pretraining, even when supporting evidence remains in the training data. For instance, a pronoun-gender rule, initially learned to 0.94 accuracy by step 925, scored near zero by step 3,500. This within-run reversal is predictable from the "support frequency"—how often the training stream shows the rule winning. The dynamics were observed across two corpora, three budgets, and three seeds, and also appeared in public Pythia checkpoints, with collapse depth correlating with model scale. Forgetting occurs due to displacement by a competing surface pattern, with log-probability margins crossing zero within 100 steps of behavioral collapse. Control is asymmetric: destroying a rule is straightforward, but restoring it, even with 450 times the natural support, proves ineffective.

Key takeaway

For Machine Learning Engineers optimizing language model pretraining, this research highlights a critical challenge: rules can be forgotten despite initial learning and persistent data. You must actively monitor the "support frequency" of crucial rules, as relying solely on loss curves or initial performance is insufficient. Be aware that restoring a forgotten rule is exceptionally difficult, even with significant data injection, necessitating proactive data curation to prevent "natural ungrokking" of desired behaviors.

Key insights

Language models can naturally "ungrok" learned rules during pretraining, dictated by the rule's support frequency in the corpus.

Principles

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.