Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis
Summary
A study on Pythia-family models (410M-1.4B parameters) investigates whether the direction (orientation) and magnitude (L2 norm) of Transformer hidden states serve distinct functional roles. Using L2-matched perturbation analysis, a method ensuring identical Euclidean displacements for both perturbation types, the research reveals a "cross-over dissociation." Angular perturbations increase language modeling loss by up to 42.9 times more than magnitude perturbations, while magnitude perturbations cause a disproportionately larger drop in syntactic processing accuracy (20.4% vs. 1.6% on subject-verb agreement at \delta=5). Causal intervention experiments further show that angular damage primarily flows through attention pathways (28.4% loss recovery via attention repair), whereas magnitude damage is partly mediated by LayerNorm pathways (29.9% recovery via LayerNorm repair). These patterns replicate across scales within the Pythia architecture but show different behaviors in RMSNorm-based architectures like TinyLlama-1.1B, suggesting architectural choices influence these functional roles.
Key takeaway
For research scientists working on Transformer interpretability or model editing, you should consider the distinct functional roles of vector direction and magnitude. When modifying model behavior, preserve direction for language modeling quality and factual knowledge, and preserve magnitude for syntactic coherence. This approach enables more precise interventions with fewer unintended side effects, especially in LayerNorm-based architectures.
Key insights
Direction and magnitude of Transformer hidden states have distinct computational roles in LayerNorm-based architectures.
Principles
- Direction governs attentional routing.
- Magnitude modulates processing intensity for syntax.
- Normalization type affects geometric property roles.
Method
L2-matched perturbation analysis ensures angular and magnitude perturbations achieve identical Euclidean displacements, enabling controlled comparison of their functional importance in Transformer hidden states.
In practice
- Analyze direction and magnitude separately for interpretability.
- Target edits to direction for factual knowledge.
- Target edits to magnitude for syntactic coherence.
Topics
- Transformer Representations
- L2-Matched Perturbation Analysis
- Language Model Interpretability
- Layer Normalization
- Syntactic Processing
Best for: Research Scientist, AI Researcher, AI Scientist, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.