Disentangling Direction and Magnitude in Transformer Representations: A Double Dissociation Through L2-Matched Perturbation Analysis

2026-02-13 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

A study on Pythia-family models (410M-1.4B parameters) investigates whether the direction (orientation) and magnitude (L2 norm) of Transformer hidden states serve distinct functional roles. Using L2-matched perturbation analysis, a method ensuring identical Euclidean displacements for both perturbation types, the research reveals a "cross-over dissociation." Angular perturbations increase language modeling loss by up to 42.9 times more than magnitude perturbations, while magnitude perturbations cause a disproportionately larger drop in syntactic processing accuracy (20.4% vs. 1.6% on subject-verb agreement at \delta=5). Causal intervention experiments further show that angular damage primarily flows through attention pathways (28.4% loss recovery via attention repair), whereas magnitude damage is partly mediated by LayerNorm pathways (29.9% recovery via LayerNorm repair). These patterns replicate across scales within the Pythia architecture but show different behaviors in RMSNorm-based architectures like TinyLlama-1.1B, suggesting architectural choices influence these functional roles.

Key takeaway

For research scientists working on Transformer interpretability or model editing, you should consider the distinct functional roles of vector direction and magnitude. When modifying model behavior, preserve direction for language modeling quality and factual knowledge, and preserve magnitude for syntactic coherence. This approach enables more precise interventions with fewer unintended side effects, especially in LayerNorm-based architectures.

Key insights

Direction and magnitude of Transformer hidden states have distinct computational roles in LayerNorm-based architectures.

Principles

Direction governs attentional routing.
Magnitude modulates processing intensity for syntax.
Normalization type affects geometric property roles.

Method

L2-matched perturbation analysis ensures angular and magnitude perturbations achieve identical Euclidean displacements, enabling controlled comparison of their functional importance in Transformer hidden states.

In practice

Analyze direction and magnitude separately for interpretability.
Target edits to direction for factual knowledge.
Target edits to magnitude for syntactic coherence.

Topics

Transformer Representations
L2-Matched Perturbation Analysis
Language Model Interpretability
Layer Normalization
Syntactic Processing

Best for: Research Scientist, AI Researcher, AI Scientist, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.