Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

2026-05-28 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new study, "Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders," published on 2026-05-28, addresses the limitations of current positional encoding (PE) methods like RoPE in Transformers, particularly for long-context understanding. Researchers modified an encoder Transformer to process three explicitly disentangled streams: semantic, absolute positional (AP), and relative positional (RP), confining the masked-language-modeling (MLM) objective to the semantic stream. This revealed that the isolated AP subspace collapses into a low-frequency two-dimensional manifold capturing document structure, and attention heads specialize. Crucially, standard PEs do not robustly retain macroscopic structure, unlike the disentangled approach, which improved linguistic representation on 49 of 65 linguistic phenomena on the Flash-Holmes probing benchmark.

Key takeaway

For NLP engineers optimizing Transformer performance on long-context tasks, consider explicitly disentangling positional and semantic representations. This approach, which confines the masked-language-modeling objective to the semantic stream, demonstrably preserves macroscopic structural information better than standard positional encodings like RoPE. Implementing such a disentangled architecture could significantly improve linguistic representation, as shown by gains on 49 of 65 Flash-Holmes linguistic phenomena, leading to more robust models.

Key insights

Explicitly disentangling positional and semantic representations in Transformers improves positional encoding and linguistic understanding.

Principles

Positional and semantic signals occupy orthogonal subspaces.
Attention heads specialize for structure or semantics.
Standard PEs struggle with macroscopic structure retention.

Method

Modify encoder Transformers to process three explicit streams: semantic, absolute positional (AP), and relative positional (RP), confining the masked-language-modeling (MLM) objective to the semantic stream.

In practice

Implement separate streams for semantic, AP, and RP.
Restrict MLM objective to the semantic stream.
Analyze AP subspace for 2D manifold structure.

Topics

Positional Encoding
Transformers
Semantic Representation
Absolute Positional Encoding
Relative Positional Encoding
Masked Language Modeling
Flash-Holmes Benchmark

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.