Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A new study, "Give it Space! Explicit Disentangling of Positional and Semantic Representations in Encoders," published on 2026-05-28, addresses the limitations of current positional encoding (PE) methods like RoPE in Transformers, particularly for long-context understanding. Researchers modified an encoder Transformer to process three explicitly disentangled streams: semantic, absolute positional (AP), and relative positional (RP), confining the masked-language-modeling (MLM) objective to the semantic stream. This revealed that the isolated AP subspace collapses into a low-frequency two-dimensional manifold capturing document structure, and attention heads specialize. Crucially, standard PEs do not robustly retain macroscopic structure, unlike the disentangled approach, which improved linguistic representation on 49 of 65 linguistic phenomena on the Flash-Holmes probing benchmark.

Key takeaway

For NLP engineers optimizing Transformer performance on long-context tasks, consider explicitly disentangling positional and semantic representations. This approach, which confines the masked-language-modeling objective to the semantic stream, demonstrably preserves macroscopic structural information better than standard positional encodings like RoPE. Implementing such a disentangled architecture could significantly improve linguistic representation, as shown by gains on 49 of 65 Flash-Holmes linguistic phenomena, leading to more robust models.

Key insights

Explicitly disentangling positional and semantic representations in Transformers improves positional encoding and linguistic understanding.

Principles

Method

Modify encoder Transformers to process three explicit streams: semantic, absolute positional (AP), and relative positional (RP), confining the masked-language-modeling (MLM) objective to the semantic stream.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.