Sessa: Selective State Space Attention

2026-04-20 · Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, medium

Summary

Liubomyr Horbatko introduces Sessa, a new decoder architecture that integrates attention mechanisms within a feedback path, enabling recurrent multi-path aggregation. Unlike traditional Transformers, which suffer from diluted token influence in diffuse attention, or selective state-space models like Mamba, which exhibit exponential decay of long-range sensitivity, Sessa maintains a power-law memory tail of order $O(\ell^{-\beta})$ for $0<\beta<1$. This asymptotic rate is slower than $1/\ell$ and is tight in diffuse uniform-routing scenarios. Sessa demonstrates flexible selective retrieval, including non-decaying profiles, and achieves superior performance on long-context benchmarks while remaining competitive with Transformer and Mamba baselines on short-context language modeling, under matched architectures and training budgets.

Key takeaway

For research scientists developing sequence models, Sessa offers a compelling alternative to Transformers and Mamba, particularly for tasks requiring robust long-range context understanding. Its ability to maintain a power-law memory tail and achieve flexible selective retrieval suggests it can overcome limitations of existing architectures in handling extensive dependencies. You should evaluate Sessa for applications where sustained long-term memory and efficient processing are critical.

Key insights

Sessa integrates attention into a recurrent feedback path, achieving superior long-range memory compared to Transformers and Mamba.

Principles

Attention within feedback paths improves long-range memory.
Power-law memory decay ($O(\ell^{-\beta})$) outperforms $O(1/\ell)$.

Method

Sessa places attention inside a feedback path to enable recurrent many-path aggregation within a layer, facilitating flexible selective retrieval.

In practice

Use Sessa for long-context language modeling tasks.
Consider Sessa for flexible selective retrieval needs.

Topics

Sessa Decoder
Selective State Space Models
Transformer Architecture
Power-Law Memory Tail
Recurrent Attention

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.