A Three-Phase Factual Recall Circuit in Gemma-2B and Gemma-12B-IT
Summary
BizzaroWorld, a mechanistic interpretability study, localized a three-phase factual recall circuit within the Gemma-2B and Gemma-12B-IT large language models. Using activation patching across 60 prompt pairs and 20 knowledge categories, the research identified distinct stages for factual knowledge processing. In Gemma-2B, Phase 1 (Storage) occurs in layers 0-14 at the entity token position, with the residual stream dominating. Phase 2 (Routing) involves distributed attention heads, moving the signal to the final prediction position, though no single head was solely responsible. Phase 3 (Readout) happens in layers 15-17 at the final token position, where the answer is retrieved. This circuit replicated in Gemma-12B-IT, with storage shifting to layers 0-27 and readout in final layers, demonstrating scalability. The study also highlighted tokenizer-induced dataset drift, where Gemma-12B-IT excluded three prompt pairs due to tokenization differences, impacting cross-model comparisons.
Key takeaway
For research scientists investigating LLM internal mechanisms, understanding the three-phase factual recall circuit in Gemma models is crucial. You should account for tokenizer-induced dataset drift when comparing models, pre-running prompt sets through all target architectures. This ensures valid cross-model mechanistic comparisons and informs targeted interventions when factual recall fails. Consider path patching for more precise causal relationship mapping.
Key insights
Gemma models process factual recall via a consistent three-phase circuit: storage, distributed routing, and final readout.
Principles
- Factual knowledge is encoded as directions in the residual stream.
- Tokenizer differences can cause dataset drift across models.
- Mechanistic interpretability can localize model behaviors.
Method
The study used activation patching with logit differences between clean/corrupt prompt pairs, measured by a "TotalSwing" metric, to isolate components across layers and sublayers in Gemma-2B and Gemma-12B-IT.
In practice
- Pre-run fact batteries through all models to detect tokenizer drift.
- Use TransformerLens for detailed component isolation in LLMs.
- Consider path patching for edge-level causal relationships.
Topics
- Mechanistic Interpretability
- Gemma Models
- Factual Recall Circuits
- Activation Patching
- Tokenizer Drift
- TransformerLens
Code references
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.