Models
Dynamic Linear Attention
The paper introduces Dynamic Linear Attention (DLA), a framework designed to enhance multi-state linear attention mechanisms for Large Language Models (LLMs) by implementing Information-Aware Dynamic State Merging and Capacity-Bounded Memory Modeling. DLA adaptively merges states based on token importance, resulting in improved representation capacity for long contexts while maintaining a fixed-size memory cache. Experimental evaluations across 16 datasets show DLA outperforms existing state-of-the-art methods, making it a significant advancement for practitioners aiming to optimize LLM performance in long-context scenarios.
linear-attentionllmscalability