Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders
Summary
A new study, "Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders," published on 2026-05-28, addresses the limitations of current positional encoding (PE) methods like RoPE in Transformers, particularly for long-context understanding. Researchers modified an encoder Transformer to process three explicitly disentangled streams: semantic, absolute positional (AP), and relative positional (RP), confining the masked-language-modeling (MLM) objective to the semantic stream. This revealed that the isolated AP subspace collapses into a low-frequency two-dimensional manifold capturing document structure, and attention heads specialize. Crucially, standard PEs do not robustly retain macroscopic structure, unlike the disentangled approach, which improved linguistic representation on 49 of 65 linguistic phenomena on the Flash-Holmes probing benchmark.
Key takeaway
For NLP engineers optimizing Transformer performance on long-context tasks, consider explicitly disentangling positional and semantic representations. This approach, which confines the masked-language-modeling objective to the semantic stream, demonstrably preserves macroscopic structural information better than standard positional encodings like RoPE. Implementing such a disentangled architecture could significantly improve linguistic representation, as shown by gains on 49 of 65 Flash-Holmes linguistic phenomena, leading to more robust models.
Key insights
Explicitly disentangling positional and semantic representations in Transformers improves positional encoding and linguistic understanding.
Principles
- Positional and semantic signals occupy orthogonal subspaces.
- Attention heads specialize for structure or semantics.
- Standard PEs struggle with macroscopic structure retention.
Method
Modify encoder Transformers to process three explicit streams: semantic, absolute positional (AP), and relative positional (RP), confining the masked-language-modeling (MLM) objective to the semantic stream.
In practice
- Implement separate streams for semantic, AP, and RP.
- Restrict MLM objective to the semantic stream.
- Analyze AP subspace for 2D manifold structure.
Topics
- Positional Encoding
- Transformers
- Semantic Representation
- Absolute Positional Encoding
- Relative Positional Encoding
- Masked Language Modeling
- Flash-Holmes Benchmark
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.