Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models
Summary
A mechanistic study investigated whether a "screen-and-ablate" recipe consistently identifies attention-head circuits across different 1B-class language models and composed tasks. The research used Pythia 1B, OLMo 1B, and OLMoE 1B-7B models, applying a unified protocol across four tasks: indirect-object identification, greater-than, successor sequences, and variable binding. A key finding from the 12 (task, model) cells, each with ten seeds, is that the same behavioral capability is implemented via different attention-pattern types across models, indicating pattern selectivity is not task-causal structure. The study introduces a five-category screen-outcome taxonomy, observing all outcomes. It also proposes a falsifiable hypothesis for the MoE model, suggesting composed-task circuits build on a "previous-token positional substrate," with indirect-object identification as an exception.
Key takeaway
For AI Scientists designing mechanistic interpretability studies, recognize that attention-head circuits implementing the same task vary significantly across different 1B-class language models. You should not assume pattern selectivity directly indicates task-causal structure; instead, tailor your causal verification protocols to each specific model architecture. This ensures accurate mechanistic claims and avoids misinterpreting model behavior based on generalized patterns.
Key insights
Task-specific behavioral capabilities are implemented through distinct attention-pattern types across different 1B-class language models.
Principles
- Pattern selectivity is not task-causal structure.
- Mechanistic claims need per-model causal verification.
- Spectral participation-ratio indicates specialized computation.
Method
Identify attention-head circuits by task-pattern selectivity, then verify by causal ablation against a matched-random null. This recipe ports across pipelines, but specific circuits do not.
In practice
- Apply five-category screen-outcome taxonomy.
- Test MoE models for previous-token substrate.
- Use task-pattern screen for causal verification.
Topics
- Language Models
- Mechanistic Interpretability
- Attention Mechanisms
- Causal Ablation
- Mixture-of-Experts
- Model Architecture
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.