Spectral Probe-Circuits: A Three-Step Recipe for Identifying Attention-Head Circuits in Pretrained Transformers

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

A three-step recipe, "Spectral Probe-Circuits," is presented for identifying attention-head circuits in pretrained transformers, validated across an 8x parameter range (51M to 1B-active/7B-total) and diverse architectures. The method uses a spectral signal (time-integrated participation ratio) to rank heads, a task-pattern screen to filter for task-specific candidates, and causal verification via group ablation. Key findings include the consistent identification of a small (3–6 head) induction circuit in every model tested, the spectral signal's predictive power for seed-specific circuits without task labels, and a conserved fraction of specialized heads at ~17–19% across scale, with specific capability circuits remaining sublinear (3–11 heads) in total head count.

Key takeaway

For research scientists and ML engineers analyzing transformer behavior, this recipe offers a robust, label-free method to pinpoint specific attention-head circuits. You can apply this three-step process during pretraining or on fully-trained models to understand mechanistic implementations. This approach helps you identify causally necessary circuits and their compositional dependencies, even when specific head implementations vary across models or training seeds.

Key insights

A three-step recipe identifies transformer attention-head circuits using spectral signals, task-pattern screens, and causal verification.

Principles

Spectral signals indicate general content-dependent computation.
Task-pattern screens specialize general signals to specific capabilities.
Causal verification with matched-random controls ensures falsifiable claims.

Method

The recipe involves computing a per-head spectral signal (PR-integral), applying a task-pattern screen for selectivity, and performing causal verification via group ablation against matched-random and upper-bound controls.

In practice

Use PR-integral as a fast pre-filter for specialized heads.
Screen for canonical attention patterns (e.g., induction, previous-token).
Ablate candidate circuits against matched-random controls for validation.

Topics

Mechanistic Interpretability
Attention Heads
Transformer Circuits
Spectral Analysis
Causal Ablation
Participation Ratio

Code references

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.