Research
RoVE: Rotary Value Embeddings Attention for Relative Position-dependent Value Pathways
The article introduces Rotary Value Embeddings (RoVE), a modification to Rotary Position Embeddings (RoPE) that enhances value pathways in attention mechanisms by making them position-sensitive. This approach allows values to be rotated in conjunction with keys, effectively transforming RoPE attention into attentive convolution. Empirical results from training 124M and 354M parameter GPT-2 models demonstrate significant improvements in few-shot learning, out-of-distribution perplexity, and long-context retrieval, particularly benefiting tasks that necessitate long-range information aggregation.
attentionllmposition