When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

2026-06-01 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study tracked attention-head circuit formation across three 1B-class language models: Pythia 1B, OLMo 1B-0724-hf, and OLMoE 1B-7B-0924, spanning dense transformer and mixture-of-experts architectures trained on The Pile and DCLM corpora. Using a participation-ratio (PR) spectral signal and a capability-specific selectivity screen across 10 log-spaced revisions per model, researchers found that Layers 0 and 1 consistently produce zero BOS-classified heads, an architectural property. Whole-model BOS-attractor fraction exhibits distinct emergence shapes, from gradual ramps to sharp phase transitions. In DCLM models, induction-circuit formation precedes BOS-attractor formation by 10-20x in tokens, indicating two distinct transitions. The capability-specific screen converges to the final induction circuit within 0.3-2% of total training tokens, enabling early circuit identification. Per-head PR is elevated at or before a head crosses its capability-selectivity threshold. This work refines the induction-phase-transition understanding, showing induction and attention-sink transitions are separated by an order of magnitude in tokens and have qualitatively different shapes in 1B-class DCLM models.

Key takeaway

For Machine Learning Engineers optimizing 1B-class language model training, understanding the distinct developmental trajectories of induction and attention-sink circuits is crucial. You should recognize that circuit identification can occur very early, within 0.3-2% of total training tokens, allowing for earlier analysis and potential optimization of training schedules. This insight suggests that early-stage mechanistic interpretability can inform architectural choices or training strategies, especially when dealing with DCLM models where these transitions are separated by an order of magnitude in tokens.

Key insights

Induction and attention-sink circuit formations are distinct developmental transitions in 1B-class language models.

Principles

L0/L1 zero-BOS floor is an architectural property.
Capability-circuit and attention-sink formation are distinct.
Early training tokens reveal final induction circuits.

Method

Track attention-head circuit development using participation-ratio spectral signals and capability-specific selectivity screens across model revisions.

In practice

Identify induction circuits early in training (0.3-2% tokens).
Distinguish induction from attention-sink transitions.
Analyze PR elevation for head capability thresholds.

Topics

Attention Circuits
Mechanistic Interpretability
Language Model Training
Transformer Architectures
Mixture-of-Experts
DCLM Corpus

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.