Improved state mixing in higher-order and block diagonal linear recurrent networks
Summary
Igor Dubinin, Antonio Orvieto, and Felix Effenberger introduced two new structured Linear Recurrent Network (LRNN) architectures, Higher-order Linear Recurrent Units (H-LRU) and Block-Diagonal LRUs (BD-LRU), on February 12, 2026. These models aim to enhance expressivity in LRNNs and linear state space models (SSMs) for long-sequence tasks, which traditionally suffer from limited expressivity due to diagonal state transitions, while maintaining computational and memory efficiency. H-LRU generalizes first-order recurrence to higher orders by mixing multiple past states, and BD-LRU enables dense intra-block channel mixing. Both architectures incorporate L1-normalization for stabilization and use a parallel-scan implementation to ensure competitive throughput. In synthetic sequence modeling, BD-LRU matched or surpassed Mamba, DeltaNet, and LSTM baselines, while H-LRU proved most parameter-efficient in compression tasks, demonstrating that state mixing structure, not just width, drives LRNN expressivity.
Key takeaway
For research scientists developing efficient sequence models, consider integrating higher-order or block-diagonal state mixing into your LRNN designs. These architectures offer a practical path to closing the efficiency-expressivity gap, with BD-LRU excelling in synthetic sequence modeling and H-LRU in parameter efficiency for compression. Evaluate the impact of structured state mixing over simply increasing model width to optimize performance.
Key insights
Richer state mixing in LRNNs enhances expressivity and efficiency for long-sequence modeling.
Principles
- Higher-order recurrence improves state mixing.
- Block-diagonal structures enable dense channel mixing.
- L1-normalization stabilizes training in LRNNs.
Method
The proposed H-LRU and BD-LRU architectures generalize recurrence and enable dense channel mixing, stabilized by L1-normalization and implemented with parallel-scan for efficiency.
In practice
- Use H-LRU for parameter-efficient compression tasks.
- Employ BD-LRU for competitive synthetic sequence modeling.
Topics
- Linear Recurrent Networks
- State Space Models
- Sequence Modeling
- Model Expressivity
- Higher-order Recurrence
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.