Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames
Summary
A new paper investigates relational rank geometry within Transformer hidden states, offering a complementary interpretation to local features like neurons or attention heads. It employs Plucker sign entropy to detect arity-matched orientation signatures among token tuples. Across Llama-family 8B, 70B, and 405B checkpoints, true relation tuples consistently showed stronger orientation-sign consistency at the expected rank k=r for r=3 through 6, compared to scrambled tuples. The research further demonstrates that this relation geometry can be steered. Using an edge-grid clean/corrupt intervention assay, patching corrupt hidden-state relation frames toward clean targets in 70B and 405B models successfully recovered clean-answer behavior and residual relation geometry, unlike various control methods.
Key takeaway
For AI scientists investigating transformer interpretability or steerability, this research offers a novel approach. You can detect and manipulate complex multi-argument relations within hidden states, moving beyond local features. This enables precise behavioral control by targeting specific relation-frame geometries, opening avenues for fine-grained model editing and understanding how models encode abstract relationships. Consider applying these detection and steering methods to your own models.
Key insights
Transformers encode multi-argument relations as detectable and steerable rank-indexed hidden-state geometries.
Principles
- Transformer hidden states contain rank-indexed relation geometry.
- Plucker sign entropy detects arity-matched orientation signatures.
- Relation geometry can be steered to recover specific behaviors.
Method
Plucker sign entropy tests r-argument relations for arity-matched orientation signatures. Intervention involves patching corrupt hidden-state relation frames toward clean targets.
In practice
- Probe hidden states for specific relational structures.
- Intervene on relation frames to modify model behavior.
- Test on Llama-family models (8B, 70B, 405B).
Topics
- Transformer Interpretability
- Hidden States
- Relational Geometry
- Model Steering
- Plucker Sign Entropy
- Llama Models
Best for: Research Scientist, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.