Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete

2026-06-01 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Mathematics & Computational Sciences · Depth: Expert, medium

Summary

The paper "Rethinking the Role of Positional Encoding: Sliding-Window Transformers without PE Remain Turing Complete" challenges the long-held belief that positional encoding (PE) is necessary for transformers to process ordered sequences and achieve Turing completeness. The authors argue that for long-form reasoning, where generation uses a finite sliding context window, the window mechanism itself sufficiently breaks permutation symmetry. They introduce the abstract autoregressive HIST model, proving its Turing completeness by demonstrating its ability to simulate Turing-complete Post machines. Subsequently, they construct a sliding-window transformer using a constant-size token alphabet, entirely without PE, and show it can simulate the HIST model. This research concludes that positional encodings are not indispensable for universal computation in transformers, as the window sliding mechanism provides sufficient positional information.

Key takeaway

For AI Scientists and Machine Learning Engineers designing efficient long-context models, you should reconsider the necessity of explicit positional encodings. This research suggests that sliding-window attention inherently provides sufficient positional information for universal computation. You could simplify transformer architectures by omitting PE, potentially reducing model complexity and memory footprint, especially when working with finite context windows. Explore architectures that leverage the sliding window mechanism for implicit positional awareness.

Key insights

Sliding-window transformers can achieve Turing completeness without explicit positional encodings due to inherent window-induced asymmetry.

Principles

Sliding windows inherently break permutation symmetry.
Positional encoding is not always indispensable for universal computation.
Window evolution can implicitly reveal token positional information.

Method

The HIST model, an abstract autoregressive model, simulates Turing-complete Post machines by revealing tokens leaving the window, which a PE-free sliding-window transformer then simulates.

In practice

Design transformer architectures without explicit positional encoding.
Utilize sliding windows for implicit positional awareness in long sequences.

Topics

Positional Encoding
Transformers
Turing Completeness
Sliding Window Attention
Autoregressive Models

Code references

yuyijiong/sliding-window-attention-adaptation

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.