Research
The Reservoir Attention Network: Cross-Pass State in Pretrained Transformers via Content-Addressable Reservoir Injection
The article presents the Reservoir Attention Network (RAN), which integrates a fixed, randomly-initialized reservoir into the mid-layer attention of pretrained transformers like GPT-2 and Qwen2.5 to facilitate state retention across forward passes. The study demonstrates that untrained recurrent dynamics can effectively maintain cross-pass state without requiring additional training, potentially offering a computationally efficient alternative for enhancing transformer architectures. This approach could influence future model designs by providing insights into state management in resource-constrained environments.
transformersreservoirattention