Closure-Validated Circuit Discovery in Attention Heads: Co-activation Proposes, Ablation Disposes

2026-06-08 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new method, "closure-validated circuit discovery," identifies functional groups of attention heads within large language models by clustering co-activation statistics and then validating these clusters through causal ablation. This approach, adapted from sparse autoencoder clustering, was applied to two dense 1B-scale models, Pythia 1B and OLMo 1B, and a Mixture-of-Experts model, OLMoE-1B-7B. For Pythia 1B and OLMo 1B, the discovered attention head communities successfully passed the closure test, demonstrating that ablating these groups caused predictable per-example damage. However, in the OLMoE-1B-7B model, route-conditional clustering yielded a signal that failed the closure test; surprisingly, ablation improved loss rather than damaging it. The research concludes that co-activation signals serve as circuit proposals, with causal ablation via a closure test being essential for confirming actual functional circuits.

Key takeaway

For AI Scientists and NLP Engineers developing or interpreting large language models, you should integrate causal ablation and closure tests into your circuit discovery workflows. Relying solely on co-activation statistics for identifying functional attention head circuits is insufficient, as these only provide proposals. Your validation process must include rigorous ablation to confirm actual circuit functionality, especially when working with complex architectures like Mixture-of-Experts models, where ablation might yield counter-intuitive results.

Key insights

Co-activation proposes circuits, but causal ablation via closure confirms them in attention heads.

Principles

Co-activation is a circuit proposal.
Causal ablation validates circuit function.
Closure separates proposals from confirmed circuits.

Method

Cluster attention heads using co-activation statistics, then perform a closure test by causally ablating the discovered community and comparing per-example damage to matched-random controls.

In practice

Use closure tests for circuit validation.
Distinguish proposals from confirmed circuits.
Apply to dense and MoE models.

Topics

Attention Heads
Circuit Discovery
Causal Ablation
Model Interpretability
Large Language Models
Mixture-of-Experts

Best for: Research Scientist, AI Scientist, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.