Improved state mixing in higher-order and block diagonal linear recurrent networks

2026-02-12 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

Igor Dubinin, Antonio Orvieto, and Felix Effenberger introduced two new structured Linear Recurrent Network (LRNN) architectures, Higher-order Linear Recurrent Units (H-LRU) and Block-Diagonal LRUs (BD-LRU), on February 12, 2026. These models aim to enhance expressivity in LRNNs and linear state space models (SSMs) for long-sequence tasks, which traditionally suffer from limited expressivity due to diagonal state transitions, while maintaining computational and memory efficiency. H-LRU generalizes first-order recurrence to higher orders by mixing multiple past states, and BD-LRU enables dense intra-block channel mixing. Both architectures incorporate L1-normalization for stabilization and use a parallel-scan implementation to ensure competitive throughput. In synthetic sequence modeling, BD-LRU matched or surpassed Mamba, DeltaNet, and LSTM baselines, while H-LRU proved most parameter-efficient in compression tasks, demonstrating that state mixing structure, not just width, drives LRNN expressivity.

Key takeaway

For research scientists developing efficient sequence models, consider integrating higher-order or block-diagonal state mixing into your LRNN designs. These architectures offer a practical path to closing the efficiency-expressivity gap, with BD-LRU excelling in synthetic sequence modeling and H-LRU in parameter efficiency for compression. Evaluate the impact of structured state mixing over simply increasing model width to optimize performance.

Key insights

Richer state mixing in LRNNs enhances expressivity and efficiency for long-sequence modeling.

Principles

Higher-order recurrence improves state mixing.
Block-diagonal structures enable dense channel mixing.
L1-normalization stabilizes training in LRNNs.

Method

The proposed H-LRU and BD-LRU architectures generalize recurrence and enable dense channel mixing, stabilized by L1-normalization and implemented with parallel-scan for efficiency.

In practice

Use H-LRU for parameter-efficient compression tasks.
Employ BD-LRU for competitive synthetic sequence modeling.

Topics

Linear Recurrent Networks
State Space Models
Sequence Modeling
Model Expressivity
Higher-order Recurrence

Code references

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.