Detection vs. Execution: Single-Bucket Probes Miss Half the Mamba-2 State Sink

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A study on Mamba-2 reveals that the common mechanistic interpretability assumption—that probes identifying a representational signature also identify the executing circuit—can systematically fail. Researchers found that single-bucket probes for the Mamba-2 state sink, analogous to the attention sink, recover only a small execution layer while missing a much larger detection layer with the same representational signature. The state sink decomposes into two functional head sets: BOS-specialist heads (about 5% of heads at 2.7B) causally support BOS-context and newline-target predictions, while dual heads (27-35% of heads) show stronger representational similarity but weaker causal effects. Ablating BOS-specialist heads collapsed RULER NIAH retrieval accuracy from 1.00 to 0.00 at 1024 context length in both Mamba-1 2.8B and Mamba-2 2.7B, confirming their functional importance. This distinction, implicating Mamba-2's head-shared Delta projection, highlights that separating execution from detection circuits requires class-conditional ablation.

Key takeaway

For Machine Learning Engineers interpreting Mamba-2's internal mechanisms, recognize that single-bucket probes may identify detection layers without corresponding execution circuits. Your interpretability efforts should incorporate class-conditional ablation to differentiate functional head sets, as representational similarity alone does not guarantee causal effect. This distinction is critical for accurately understanding and modifying model behavior, especially for tasks like RULER NIAH retrieval, where BOS-specialist heads are crucial.

Key insights

Representational similarity in Mamba-2 does not imply functional equivalence for mechanistic interpretability probes.

Principles

Method

Distinguish detection from execution circuits by using class-conditional ablation, rather than just class-conditional cosine similarity, especially when probes recover both at coarse granularity.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.