Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models
Summary
This study investigates whether attention-head circuit identification, using task-pattern selectivity and causal ablation, yields consistent mechanistic claims across different 1B-class language models. Analyzing Pythia 1B, OLMo 1B, and OLMoE 1B-7B across four composed tasks—indirect-object identification, greater-than, successor sequences, and variable binding—the research found that while the screen-and-ablate recipe ports, the specific circuits identified do not. No two of the 12 (task, model) cells shared the same primary causal screen at comparable effect size. The paper introduces a five-category screen-outcome taxonomy (primary cause, secondary cause, correlate, interferer, null), demonstrating all categories appear. A falsifiable hypothesis is proposed: OLMoE 1B-7B builds composed-task circuits on a foundational prev-token positional substrate for three tasks, with IOI as an exception. The work also highlights that top-1 accuracy is often an insufficient metric for ablation studies.
Key takeaway
For AI Scientists and Machine Learning Engineers conducting mechanistic interpretability, you should not assume that specific attention-head circuits transfer across different 1B-class language models or architectures. Always re-derive and causally verify circuits for each new model and task using a full family of candidate screens. Furthermore, ensure your ablation studies report both Δtop-1 and Δlogit-diff, as relying solely on top-1 accuracy can miss significant margin-only effects in redundant models.
Key insights
Specific mechanistic circuits for composed tasks do not transfer across 1B-class language models.
Principles
- Circuit identification recipes port, but specific circuits do not.
- Per-model causal verification is essential for task-specific findings.
- Logit-margin metrics reveal effects top-1 accuracy misses.
Method
Apply a three-step screen-and-ablate protocol: spectral signal, task-pattern screen, and causal verification with a 10-seed matched-random null.
In practice
- Re-derive specific circuits for each new model and task.
- Use individual head ablations to diagnose interferers.
- Report Δtop-1 and Δlogit-diff in ablation studies.
Topics
- Mechanistic Interpretability
- Attention Heads
- Mixture-of-Experts
- Causal Ablation
- Language Model Architectures
- Composed Tasks
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.