Trajectory Geometry of Transformer Representations Across Layers
Summary
A new study introduces "trajectory geometry" as a probe-free lens for mechanistic interpretability, analyzing how Transformer representations evolve across layers. By recasting the forward pass as a discrete population trajectory, researchers applied five geometric metrics: length, curvature, semantic convergence, layerwise cosine similarity, and representational stability. Across GPT-2, TinyLlama, and Qwen2.5 models with five prompt families, four key findings emerged. Semantically related prompts converged significantly in middle-to-late layers (peak CI 0.41-0.58, p<0.001). Reasoning tasks produced trajectories with greater curvature (0.71-0.83 rad) compared to lexical variations (0.27-0.31 rad). Ambiguous tokens exhibited trajectory bifurcation, showing up to 5.6x representational separation by the final layer. Finally, a universal three-phase structure—encoding, elaboration, and output preparation—was observed via layerwise cosine similarity. A fully open-source, model-agnostic pipeline is released.
Key takeaway
For AI scientists focused on mechanistic interpretability, this research offers a novel, probe-free approach to understanding Transformer internal workings. You should consider integrating trajectory geometry analysis into your model development and debugging workflows, especially when diagnosing representational ambiguity or optimizing for specific computational complexities. This method provides a principled lens to observe how your models process information layer-by-layer, potentially revealing insights beyond traditional feature probing.
Key insights
Trajectory geometry offers a principled, probe-free lens to understand Transformer representation evolution across layers.
Principles
- Transformer representations evolve as discrete population trajectories.
- Trajectory geometry metrics reveal internal computational dynamics.
- Semantic convergence and bifurcation indicate processing stages.
Method
Recasting the Transformer forward pass as a discrete population trajectory, five geometric metrics are computed directly in ambient space: length, curvature, semantic convergence index, layerwise cosine similarity, and representational stability.
In practice
- Use trajectory geometry for mechanistic interpretability.
- Apply the open-source pipeline to analyze models.
- Investigate specific layer dynamics for task performance.
Topics
- Transformer Representations
- Mechanistic Interpretability
- Trajectory Geometry
- Neural Network Dynamics
- Large Language Models
- Model Analysis Pipeline
Best for: Research Scientist, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.