The Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The Reservoir Attention Network (RAN) is an architectural study exploring the injection of a fixed, randomly-initialized reservoir into the mid-layer attention of pretrained transformers to carry state across forward passes. This feasibility and dynamics study conducted experiments on GPT-2 (124M, 355M) and Qwen2.5 (0.5B, 1.5B) models, utilizing a single consumer GPU. The tasks involved minimal probes designed to isolate individual mechanisms. A key design choice was leaving the reservoir untrained (fixed random) to specifically determine if untrained recurrent dynamics alone are sufficient for carrying usable cross-pass state, positioning trained recurrence as a more complex, complementary future direction.

Key takeaway

For Machine Learning Engineers exploring efficient stateful transformer architectures, this study demonstrates that even untrained recurrent dynamics, via a fixed reservoir, can enable cross-pass state. You should consider investigating the injection of such randomly-initialized reservoirs into your pretrained models, as shown on GPT-2 and Qwen2.5, to potentially achieve memory across interactions without the overhead of training the recurrent component. This approach offers a path towards always-alive agent capabilities.

Key insights

The Reservoir Attention Network enables cross-pass state in pretrained transformers using an untrained, fixed reservoir.

Principles

Method

Inject a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to establish cross-pass state without training the reservoir itself.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.