Transformer Geometry Observatory TGO-I: Spectral Geometry Observatory
Summary
The Transformer Geometry Observatory (TGO) is introduced as a systematic framework for investigating the representational geometry and dynamics of Vision Transformers (ViTs). TGO-I, its initial installment, specifically examines the spectral geometry of ViT representations. Utilizing a ViT-Small/16 model trained on ImageNet-100, the framework analyzes metrics such as Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra throughout the training process. The analysis reveals a consistent increase in dimensional utilization, coupled with decreasing anisotropy, increasing spectral entropy, increasing participation ratio, and progressively flatter eigenspectra. This observation challenges the common assumption that training concentrates information into a few dominant directions, instead showing a progressive redistribution of variance across representational dimensions, particularly in the final CLS token.
Key takeaway
For machine learning engineers designing or analyzing Vision Transformers, these findings challenge the common intuition that training concentrates information into a few dominant directions. You should instead consider that variance progressively redistributes across representational dimensions, leading to increased effective dimensionality. This is particularly relevant for the final CLS token, which exhibits the highest effective dimensionality. Factor this dynamic into your model interpretation and architectural decisions.
Key insights
ViT training progressively redistributes representational variance across dimensions, increasing effective dimensionality and decreasing anisotropy, contrary to common intuition.
Principles
- ViT training increases dimensional utilization.
- Variance redistributes across ViT dimensions.
- CLS token shows highest effective dimensionality.
Method
TGO-I systematically investigates ViT spectral geometry using a ViT-Small/16 on ImageNet-100. It analyzes Effective Rank, Stable Rank, Participation Ratio, Spectral Entropy, Spectral Flatness, Spectral Anisotropy, covariance structure, eigenspectra, and singular value spectra during training.
In practice
- Re-evaluate assumptions on ViT information concentration.
- Focus analysis on CLS token for dimensionality insights.
Topics
- Vision Transformers
- Representational Geometry
- Spectral Analysis
- Dimensionality Reduction
- CLS Token
- ImageNet-100
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.