InferencearXiv cs.AI — 10 d ago

Learning in the Recurrent State: Gradient Descent with Linear Recurrent Networks

The article introduces the Gradient-based Recurrent In-context Learner (GRIL), a novel architecture for linear recurrent networks (LRNNs) that utilizes a diagonal recurrent state with multiplicative readout and sliding-window cross-product self-attention to facilitate in-context gradient descent. GRIL enables efficient minibatch gradient descent during a single forward pass and shows empirical success on synthetic tasks and benchmarks like Long Range Arena, highlighting its potential for enhancing sequence modeling and classification tasks. This architecture offers a practical inductive bias for practitioners looking to implement efficient learning mechanisms in LRNNs.

gradient descentrecurrent networksrelevance 0.00 · engagement 0.00

Read at source ↗← all news