Relational Rank Geometry in Transformers: Detecting and Steering Hidden-State Relation Frames

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new paper investigates relational rank geometry within Transformer hidden states, offering a complementary interpretation to local features like neurons or attention heads. It employs Plucker sign entropy to detect arity-matched orientation signatures among token tuples. Across Llama-family 8B, 70B, and 405B checkpoints, true relation tuples consistently showed stronger orientation-sign consistency at the expected rank k=r for r=3 through 6, compared to scrambled tuples. The research further demonstrates that this relation geometry can be steered. Using an edge-grid clean/corrupt intervention assay, patching corrupt hidden-state relation frames toward clean targets in 70B and 405B models successfully recovered clean-answer behavior and residual relation geometry, unlike various control methods.

Key takeaway

For AI scientists investigating transformer interpretability or steerability, this research offers a novel approach. You can detect and manipulate complex multi-argument relations within hidden states, moving beyond local features. This enables precise behavioral control by targeting specific relation-frame geometries, opening avenues for fine-grained model editing and understanding how models encode abstract relationships. Consider applying these detection and steering methods to your own models.

Key insights

Transformers encode multi-argument relations as detectable and steerable rank-indexed hidden-state geometries.

Principles

Method

Plucker sign entropy tests r-argument relations for arity-matched orientation signatures. Intervention involves patching corrupt hidden-state relation frames toward clean targets.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.