An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

2026-06-16 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new expressivity analysis, published on 2026-06-16, rigorously examines how deep transformer models represent hierarchical structures. This work, using the formal lens of bounded-depth, non-recursive context-free grammars, explicitly constructs transformers with positional attention. The analysis demonstrates that the transformer's depth grows linearly with the grammar's depth, while the neuron count scales with the number of derivation-tree shapes and quadratically with the number of production rules. These theoretical results support the linear representation hypothesis, showing that these architectures possess the structural capacity to encode abstract grammatical states into low-dimensional, linearly separable subspaces within the residual stream, thereby clarifying the source of their expressive power in language modeling.

Key takeaway

For research scientists investigating transformer capabilities in language modeling, this analysis provides a crucial theoretical foundation for understanding their hierarchical representation power. You should consider these formal scaling properties—linear depth growth with grammar depth and quadratic neuron count—when designing or evaluating transformer architectures for tasks requiring complex syntactic or semantic understanding, ensuring models possess adequate structural capacity.

Key insights

Deep transformers can formally represent hierarchical structures using bounded-depth grammars, encoding states in linearly separable subspaces.

Principles

Hierarchical representations drive deep neural network expressivity.
Transformer depth can linearly match grammar depth.
Grammatical states map to low-dimensional subspaces.

Method

Construct transformers with positional attention, where model depth scales linearly with grammar depth and neuron count scales quadratically with production rules, to represent bounded-depth grammars.

Topics

Deep Transformers
Hierarchical Representations
Context-Free Grammars
Expressivity Analysis
Positional Attention
Language Modeling

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.