A Three-Phase Factual Recall Circuit in Gemma-2B and Gemma-12B-IT

2026-06-24 · Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

BizzaroWorld, a mechanistic interpretability study, localized a three-phase factual recall circuit within the Gemma-2B and Gemma-12B-IT large language models. Using activation patching across 60 prompt pairs and 20 knowledge categories, the research identified distinct stages for factual knowledge processing. In Gemma-2B, Phase 1 (Storage) occurs in layers 0-14 at the entity token position, with the residual stream dominating. Phase 2 (Routing) involves distributed attention heads, moving the signal to the final prediction position, though no single head was solely responsible. Phase 3 (Readout) happens in layers 15-17 at the final token position, where the answer is retrieved. This circuit replicated in Gemma-12B-IT, with storage shifting to layers 0-27 and readout in final layers, demonstrating scalability. The study also highlighted tokenizer-induced dataset drift, where Gemma-12B-IT excluded three prompt pairs due to tokenization differences, impacting cross-model comparisons.

Key takeaway

For research scientists investigating LLM internal mechanisms, understanding the three-phase factual recall circuit in Gemma models is crucial. You should account for tokenizer-induced dataset drift when comparing models, pre-running prompt sets through all target architectures. This ensures valid cross-model mechanistic comparisons and informs targeted interventions when factual recall fails. Consider path patching for more precise causal relationship mapping.

Key insights

Gemma models process factual recall via a consistent three-phase circuit: storage, distributed routing, and final readout.

Principles

Factual knowledge is encoded as directions in the residual stream.
Tokenizer differences can cause dataset drift across models.
Mechanistic interpretability can localize model behaviors.

Method

The study used activation patching with logit differences between clean/corrupt prompt pairs, measured by a "TotalSwing" metric, to isolate components across layers and sublayers in Gemma-2B and Gemma-12B-IT.

In practice

Pre-run fact batteries through all models to detect tokenizer drift.
Use TransformerLens for detailed component isolation in LLMs.
Consider path patching for edge-level causal relationships.

Topics

Mechanistic Interpretability
Gemma Models
Factual Recall Circuits
Activation Patching
Tokenizer Drift
TransformerLens

Code references

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.