When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures
Summary
A study tracked attention-head circuit formation across three 1B-class language models: Pythia 1B, OLMo 1B-0724-hf, and OLMoE 1B-7B-0924, spanning dense transformer and mixture-of-experts architectures trained on The Pile and DCLM corpora. Using a participation-ratio (PR) spectral signal and a capability-specific selectivity screen across 10 log-spaced revisions per model, researchers found that Layers 0 and 1 consistently produce zero BOS-classified heads, an architectural property. Whole-model BOS-attractor fraction exhibits distinct emergence shapes, from gradual ramps to sharp phase transitions. In DCLM models, induction-circuit formation precedes BOS-attractor formation by 10-20x in tokens, indicating two distinct transitions. The capability-specific screen converges to the final induction circuit within 0.3-2% of total training tokens, enabling early circuit identification. Per-head PR is elevated at or before a head crosses its capability-selectivity threshold. This work refines the induction-phase-transition understanding, showing induction and attention-sink transitions are separated by an order of magnitude in tokens and have qualitatively different shapes in 1B-class DCLM models.
Key takeaway
For Machine Learning Engineers optimizing 1B-class language model training, understanding the distinct developmental trajectories of induction and attention-sink circuits is crucial. You should recognize that circuit identification can occur very early, within 0.3-2% of total training tokens, allowing for earlier analysis and potential optimization of training schedules. This insight suggests that early-stage mechanistic interpretability can inform architectural choices or training strategies, especially when dealing with DCLM models where these transitions are separated by an order of magnitude in tokens.
Key insights
Induction and attention-sink circuit formations are distinct developmental transitions in 1B-class language models.
Principles
- L0/L1 zero-BOS floor is an architectural property.
- Capability-circuit and attention-sink formation are distinct.
- Early training tokens reveal final induction circuits.
Method
Track attention-head circuit development using participation-ratio spectral signals and capability-specific selectivity screens across model revisions.
In practice
- Identify induction circuits early in training (0.3-2% tokens).
- Distinguish induction from attention-sink transitions.
- Analyze PR elevation for head capability thresholds.
Topics
- Attention Circuits
- Mechanistic Interpretability
- Language Model Training
- Transformer Architectures
- Mixture-of-Experts
- DCLM Corpus
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.