ResearcharXiv cs.CL — 11 d ago

Rethinking the Role of Efficient Attention in Hybrid Architectures

The paper presents a systematic analysis of hybrid architectures that integrate full attention with efficient attention modules like sliding-window attention (SWA) and recurrent sequence mixers. It reveals that efficient attention affects the speed of long-context capability emergence, while full attention is crucial for long-range retrieval, leading to the phenomenon termed Large-Window Laziness. The study demonstrates that applying NoPE to full-attention layers in a small-window SWA hybrid can significantly enhance long-context performance without compromising short-context capabilities, providing valuable insights for practitioners optimizing model architectures for diverse context lengths.

attentionarchitecturescalingrelevance 0.00 · engagement 0.00

Read at source ↗← all news