Research
Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity
This study presents a formal analysis of functional equivalence in Transformer architectures, particularly focusing on the impact of positional encodings—specifically sinusoidal and rotary positional encodings (RoPE). The authors demonstrate that sinusoidal encodings maintain the equivalence structure of traditional attention mechanisms, while RoPE reduces the symmetry group and enhances expressivity. These findings provide insights into how positional encodings influence linear mode connectivity and highlight the practical significance of RoPE in modern applications of Transformers.
attentionfunctional equivalencetransformers