An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars
Summary
A new expressivity analysis, published on 2026-06-16, rigorously examines how deep transformer models represent hierarchical structures. This work, using the formal lens of bounded-depth, non-recursive context-free grammars, explicitly constructs transformers with positional attention. The analysis demonstrates that the transformer's depth grows linearly with the grammar's depth, while the neuron count scales with the number of derivation-tree shapes and quadratically with the number of production rules. These theoretical results support the linear representation hypothesis, showing that these architectures possess the structural capacity to encode abstract grammatical states into low-dimensional, linearly separable subspaces within the residual stream, thereby clarifying the source of their expressive power in language modeling.
Key takeaway
For research scientists investigating transformer capabilities in language modeling, this analysis provides a crucial theoretical foundation for understanding their hierarchical representation power. You should consider these formal scaling properties—linear depth growth with grammar depth and quadratic neuron count—when designing or evaluating transformer architectures for tasks requiring complex syntactic or semantic understanding, ensuring models possess adequate structural capacity.
Key insights
Deep transformers can formally represent hierarchical structures using bounded-depth grammars, encoding states in linearly separable subspaces.
Principles
- Hierarchical representations drive deep neural network expressivity.
- Transformer depth can linearly match grammar depth.
- Grammatical states map to low-dimensional subspaces.
Method
Construct transformers with positional attention, where model depth scales linearly with grammar depth and neuron count scales quadratically with production rules, to represent bounded-depth grammars.
Topics
- Deep Transformers
- Hierarchical Representations
- Context-Free Grammars
- Expressivity Analysis
- Positional Attention
- Language Modeling
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.