The Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection
Summary
The Reservoir Attention Network (RAN) is an architectural study exploring the injection of a fixed, randomly-initialized reservoir into the mid-layer attention of pretrained transformers to carry state across forward passes. This feasibility and dynamics study conducted experiments on GPT-2 (124M, 355M) and Qwen2.5 (0.5B, 1.5B) models, utilizing a single consumer GPU. The tasks involved minimal probes designed to isolate individual mechanisms. A key design choice was leaving the reservoir untrained (fixed random) to specifically determine if untrained recurrent dynamics alone are sufficient for carrying usable cross-pass state, positioning trained recurrence as a more complex, complementary future direction.
Key takeaway
For Machine Learning Engineers exploring efficient stateful transformer architectures, this study demonstrates that even untrained recurrent dynamics, via a fixed reservoir, can enable cross-pass state. You should consider investigating the injection of such randomly-initialized reservoirs into your pretrained models, as shown on GPT-2 and Qwen2.5, to potentially achieve memory across interactions without the overhead of training the recurrent component. This approach offers a path towards always-alive agent capabilities.
Key insights
The Reservoir Attention Network enables cross-pass state in pretrained transformers using an untrained, fixed reservoir.
Principles
- Untrained recurrent dynamics can carry cross-pass state.
- Fixed random reservoirs inject into mid-layer attention.
Method
Inject a fixed, randomly-initialized reservoir into the mid-layer attention of a pretrained transformer to establish cross-pass state without training the reservoir itself.
In practice
- Test reservoir injection on GPT-2 and Qwen2.5.
- Utilize a single consumer GPU for architectural probes.
Topics
- Reservoir Attention Network
- Pretrained Transformers
- Cross-Pass State
- Recurrent Dynamics
- GPT-2
- Qwen2.5
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.