Contribution Weights: A Geometrical Analysis of Self-Attention Transformers

2026-05-29 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

"Contribution Weights" is a novel projection-based metric introduced for interpreting information flow within Self-Attention Transformers in Large Language Models (LLMs). This metric quantifies a token's influence by considering its attention weight, value magnitude, and directional alignment with the layer output, thereby addressing the limitations of traditional attention weights that overlook the geometric properties of value vectors. The research demonstrates that Contribution Weights provide a more faithful measure of token importance, consistently outperforming attention-based metrics in identifying semantically critical tokens across diverse decoder-only models, tasks, and datasets. Furthermore, this metric facilitates a new mechanistic analysis of "attention sinks," revealing they actively suppress information via a convex relationship between sink rate and output norm, stabilizing representations by counteracting the semantic drift of low-confidence tokens.

Key takeaway

For Machine Learning Engineers focused on LLM interpretability and debugging, traditional attention weights offer an incomplete view of token importance. You should consider adopting "Contribution Weights" to gain a more faithful understanding of information flow, as this metric accounts for value vector geometry. This approach will enable you to more accurately identify semantically critical tokens and mechanistically analyze attention sink functions, potentially leading to improved model stability and more robust LLM designs.

Key insights

Contribution Weights offer a geometrically-informed, more faithful measure of token importance and reveal active roles for attention sinks in LLMs.

Principles

Attention interpretation benefits from geometric vector analysis.
Token influence depends on weight, magnitude, and alignment.
Attention sinks actively stabilize LLM representations.

Method

The paper introduces Contribution Weights, a projection-based metric, to quantify token influence by combining attention weight, value magnitude, and directional alignment with the layer output.

In practice

Use Contribution Weights for more accurate LLM interpretability.
Apply metric to identify critical tokens in decoder-only models.
Investigate attention sink behavior for model stabilization.

Topics

Large Language Models
Self-Attention Transformers
LLM Interpretability
Contribution Weights
Attention Sinks
Geometric Analysis

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.