Three-Phase Transformer
Summary
Three-Phase Transformer (3PT) is a novel residual-stream structural prior designed for decoder-only Transformers, built on a standard SwiGLU + RMSNorm + RoPE + GQA backbone. This architecture functions as a self-stabilizing equilibrium between scrambling and re-imposition, rather than a modular add-on. Key features include partitioning the hidden vector into N cyclic channels, each managed by phase-respecting operations like per-channel RMSNorm and 2D Givens rotations. It also incorporates a one-dimensional DC subspace, orthogonal to the channels, into which a fixed Gabriel's horn profile is injected as an absolute-position side-channel. The canonical N=3 configuration draws inspiration from balanced three-phase AC systems. On WikiText-103, a 123M parameter 3PT model achieved a -7.20% perplexity reduction (-2.62% bits-per-byte) compared to a RoPE-Only baseline, with a 1.93x step-count convergence speedup.
Key takeaway
For research scientists optimizing Transformer architectures, consider integrating the Three-Phase Transformer (3PT) design. Its channel-partitioned residual stream and DC subspace injection offer significant perplexity improvements and faster convergence, potentially reducing training costs and improving model efficiency. You should explore N=3 as a strong starting point, though N=1 also performs comparably at larger scales.
Key insights
3PT introduces a self-stabilizing, channel-partitioned residual stream for Transformers, enhancing performance and convergence.
Principles
- Self-stabilization without explicit enforcement
- Orthogonal composition with existing mechanisms
Method
Partition hidden vectors into N cyclic channels, apply phase-respecting ops, inject a Gabriel's horn profile into an orthogonal DC subspace.
In practice
- Achieves -7.20% perplexity on WikiText-103
- Offers 1.93x step-count convergence speedup
Topics
- Three-Phase Transformer (3PT)
- Residual Stream Architecture
- Hidden Vector Partition
- Gabriel's Horn Profile
- Perplexity Reduction
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.