Layerwise Dynamics for In-Context Classification in Transformers

2026-04-21 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

Transformers can perform in-context classification using a few labeled examples, but their inference-time algorithm has been opaque. This research identifies an explicit, depth-indexed recursion within softmax transformers for multi-class linear classification by enforcing feature- and label-permutation equivariance at every layer. This method, which involves conjugating the attention block with a random block permutation, preserves the inference rule while making internal computations interpretable. The resulting dynamics reveal a "coupled mean-shift" algorithmic motif where attention matrices, formed from mixed feature-label Gram structure, drive coupled updates of training points, labels, and the test probe. This geometry-driven process provably amplifies class separation and yields robust expected class alignment, with the same motif reappearing when transformers are retrained on semi-supervised, label-noise, and prototype classification tasks.

Key takeaway

For AI Scientists and Research Scientists investigating transformer interpretability, this work demonstrates that enforcing feature and label symmetries can reveal the underlying algorithmic dynamics. You should consider applying symmetry-preserving architectural constraints to make complex model behaviors algebraically readable, moving beyond abstract analogies like gradient descent. This approach provides a concrete, testable framework for understanding how transformers perform in-context learning, enabling more robust design and analysis of classification tasks.

Key insights

Enforcing task symmetries reveals a closed-form, layerwise algorithmic recursion in transformers for in-context classification.

Principles

Symmetry enforcement aids interpretability.
Feature and label geometry co-evolve.
Dynamics amplify class separation.

Method

Enforce feature- and label-permutation symmetry layer-by-layer by conjugating the attention block with a random block permutation, then extract the explicit layerwise recursion.

In practice

Use symmetry constraints for model interpretability.
Apply geometry-driven dynamics for classification.
Leverage semi-supervised ICL for improved accuracy.

Topics

In-Context Learning
Transformer Architectures
Feature-Label Equivariance
Layerwise Dynamics
Coupled Mean-Shift Dynamics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.