Three-Phase Transformer

2026-04-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Three-Phase Transformer (3PT) is a novel residual-stream structural prior designed for decoder-only Transformers, built on a standard SwiGLU + RMSNorm + RoPE + GQA backbone. This architecture functions as a self-stabilizing equilibrium between scrambling and re-imposition, rather than a modular add-on. Key features include partitioning the hidden vector into N cyclic channels, each managed by phase-respecting operations like per-channel RMSNorm and 2D Givens rotations. It also incorporates a one-dimensional DC subspace, orthogonal to the channels, into which a fixed Gabriel's horn profile is injected as an absolute-position side-channel. The canonical N=3 configuration draws inspiration from balanced three-phase AC systems. On WikiText-103, a 123M parameter 3PT model achieved a -7.20% perplexity reduction (-2.62% bits-per-byte) compared to a RoPE-Only baseline, with a 1.93x step-count convergence speedup.

Key takeaway

For research scientists optimizing Transformer architectures, consider integrating the Three-Phase Transformer (3PT) design. Its channel-partitioned residual stream and DC subspace injection offer significant perplexity improvements and faster convergence, potentially reducing training costs and improving model efficiency. You should explore N=3 as a strong starting point, though N=1 also performs comparably at larger scales.

Key insights

3PT introduces a self-stabilizing, channel-partitioned residual stream for Transformers, enhancing performance and convergence.

Principles

Self-stabilization without explicit enforcement
Orthogonal composition with existing mechanisms

Method

Partition hidden vectors into N cyclic channels, apply phase-respecting ops, inject a Gabriel's horn profile into an orthogonal DC subspace.

In practice

Achieves -7.20% perplexity on WikiText-103
Offers 1.93x step-count convergence speedup

Topics

Three-Phase Transformer (3PT)
Residual Stream Architecture
Hidden Vector Partition
Gabriel's Horn Profile
Perplexity Reduction

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.