Transformer Field Theory: A Response-Theoretic Approach to Mechanistic Interpretability

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, extended

Summary

A new field-theoretic framework is introduced for Transformer mechanistic interpretability, treating the residual stream as a depth-token field. This approach formulates activation patching as localized source insertion and predicts patch effects using sensitivity fields and Green-function responses. Empirical tests on GPT-2-style autoregressive Transformers confirm a bounded local linear regime, where first-order sensitivities predict patch effects across residual sites. The framework also measures structured anisotropic propagation and shows that prompt-induced residual displacements can transfer answer behavior. This establishes response objects like sensitivities, propagated fields, and Green-operator slices as a practical language for organizing patching experiments and a mathematical basis for patch-site inference and cross-scale transfer.

Key takeaway

For AI Scientists and Machine Learning Engineers focused on Transformer interpretability, this field-theoretic framework offers a principled shift from enumerative patching to predictive, operator-based analysis. You should explore using autograd sensitivities to efficiently predict patch effects and identify critical intervention sites. This approach provides a mathematical basis for inferring optimal patch locations and understanding how model behavior scales across different Transformer sizes, streamlining mechanistic interpretability efforts.

Key insights

A field-theoretic framework unifies Transformer patching, prediction, and interpretability through response functions.

Principles

Method

Formulate residual stream as a depth-token field; model patching as localized source insertion; predict effects via sensitivity fields and Green-function responses; use adjoint variational problem for patch selection.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.