Trajectory Geometry of Transformer Representations Across Layers

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new study introduces "trajectory geometry" as a probe-free lens for mechanistic interpretability, analyzing how Transformer representations evolve across layers. By recasting the forward pass as a discrete population trajectory, researchers applied five geometric metrics: length, curvature, semantic convergence, layerwise cosine similarity, and representational stability. Across GPT-2, TinyLlama, and Qwen2.5 models with five prompt families, four key findings emerged. Semantically related prompts converged significantly in middle-to-late layers (peak CI 0.41-0.58, p<0.001). Reasoning tasks produced trajectories with greater curvature (0.71-0.83 rad) compared to lexical variations (0.27-0.31 rad). Ambiguous tokens exhibited trajectory bifurcation, showing up to 5.6x representational separation by the final layer. Finally, a universal three-phase structure—encoding, elaboration, and output preparation—was observed via layerwise cosine similarity. A fully open-source, model-agnostic pipeline is released.

Key takeaway

For AI scientists focused on mechanistic interpretability, this research offers a novel, probe-free approach to understanding Transformer internal workings. You should consider integrating trajectory geometry analysis into your model development and debugging workflows, especially when diagnosing representational ambiguity or optimizing for specific computational complexities. This method provides a principled lens to observe how your models process information layer-by-layer, potentially revealing insights beyond traditional feature probing.

Key insights

Trajectory geometry offers a principled, probe-free lens to understand Transformer representation evolution across layers.

Principles

Method

Recasting the Transformer forward pass as a discrete population trajectory, five geometric metrics are computed directly in ambient space: length, curvature, semantic convergence index, layerwise cosine similarity, and representational stability.

In practice

Topics

Best for: Research Scientist, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.