Natural Ungrokking: Asymmetric Control of Which Rules Survive Pretraining
Summary
A phenomenon termed "natural ungrokking" reveals that small language models can forget previously learned rules during pretraining, even when supporting evidence remains in the training data. For instance, a pronoun-gender rule, initially learned to 0.94 accuracy by step 925, scored near zero by step 3,500. This within-run reversal is predictable from the "support frequency"—how often the training stream shows the rule winning. The dynamics were observed across two corpora, three budgets, and three seeds, and also appeared in public Pythia checkpoints, with collapse depth correlating with model scale. Forgetting occurs due to displacement by a competing surface pattern, with log-probability margins crossing zero within 100 steps of behavioral collapse. Control is asymmetric: destroying a rule is straightforward, but restoring it, even with 450 times the natural support, proves ineffective.
Key takeaway
For Machine Learning Engineers optimizing language model pretraining, this research highlights a critical challenge: rules can be forgotten despite initial learning and persistent data. You must actively monitor the "support frequency" of crucial rules, as relying solely on loss curves or initial performance is insufficient. Be aware that restoring a forgotten rule is exceptionally difficult, even with significant data injection, necessitating proactive data curation to prevent "natural ungrokking" of desired behaviors.
Key insights
Language models can naturally "ungrok" learned rules during pretraining, dictated by the rule's support frequency in the corpus.
Principles
- Rule survival depends on training stream "support frequency."
- Forgetting is displacement by competing surface patterns.
- Destroying rules is easier than restoring them.
In practice
- Monitor rule support frequency during pretraining.
- Anticipate rule forgetting from competing data patterns.
- Recognize asymmetric control over rule retention.
Topics
- Natural Ungrokking
- Language Model Pretraining
- Rule Forgetting
- Support Frequency
- Pythia Models
- Asymmetric Control
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.