Pattern Selectivity is Not Task-Causal Structure: A Cross-Architecture Mechanistic Study of Composed-Task Circuits in 1B-Class Language Models

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mechanistic Interpretability, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

This study investigates whether attention-head circuit identification, using task-pattern selectivity and causal ablation, yields consistent mechanistic claims across different 1B-class language models. Analyzing Pythia 1B, OLMo 1B, and OLMoE 1B-7B across four composed tasks—indirect-object identification, greater-than, successor sequences, and variable binding—the research found that while the screen-and-ablate recipe ports, the specific circuits identified do not. No two of the 12 (task, model) cells shared the same primary causal screen at comparable effect size. The paper introduces a five-category screen-outcome taxonomy (primary cause, secondary cause, correlate, interferer, null), demonstrating all categories appear. A falsifiable hypothesis is proposed: OLMoE 1B-7B builds composed-task circuits on a foundational prev-token positional substrate for three tasks, with IOI as an exception. The work also highlights that top-1 accuracy is often an insufficient metric for ablation studies.

Key takeaway

For AI Scientists and Machine Learning Engineers conducting mechanistic interpretability, you should not assume that specific attention-head circuits transfer across different 1B-class language models or architectures. Always re-derive and causally verify circuits for each new model and task using a full family of candidate screens. Furthermore, ensure your ablation studies report both Δtop-1 and Δlogit-diff, as relying solely on top-1 accuracy can miss significant margin-only effects in redundant models.

Key insights

Specific mechanistic circuits for composed tasks do not transfer across 1B-class language models.

Principles

Method

Apply a three-step screen-and-ablate protocol: spectral signal, task-pattern screen, and causal verification with a 10-seed matched-random null.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.