Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A three-step recipe, "Spectral Probe-Circuits," is presented for identifying attention-head circuits in pretrained transformers, validated across an 8x parameter range (51M to 1B-active/7B-total) and diverse architectures. The method uses a spectral signal (time-integrated participation ratio) to rank heads, a task-pattern screen to filter for task-specific candidates, and causal verification via group ablation. Key findings include the consistent identification of a small (3–6 head) induction circuit in every model tested, the spectral signal's predictive power for seed-specific circuits without task labels, and a conserved fraction of specialized heads at ~17–19% across scale, with specific capability circuits remaining sublinear (3–11 heads) in total head count.

Key takeaway

For research scientists and ML engineers analyzing transformer behavior, this recipe offers a robust, label-free method to pinpoint specific attention-head circuits. You can apply this three-step process during pretraining or on fully-trained models to understand mechanistic implementations. This approach helps you identify causally necessary circuits and their compositional dependencies, even when specific head implementations vary across models or training seeds.

Key insights

A three-step recipe identifies transformer attention-head circuits using spectral signals, task-pattern screens, and causal verification.

Principles

Method

The recipe involves computing a per-head spectral signal (PR-integral), applying a task-pattern screen for selectivity, and performing causal verification via group ablation against matched-random and upper-bound controls.

In practice

Topics

Code references

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.