Research
Conservation Laws for Modern Neural Architectures
This article presents a unified framework for understanding conservation laws in modern neural architectures, including feedforward networks with GELU, SiLU, and SwiGLU activations, as well as multihead attention mechanisms with sinusoidal and rotary positional encodings. The study extends the analysis of gradient descent dynamics, which is crucial for explaining the behavior of over-parameterized models, and provides experimental validation for the identified invariants. This work is significant for practitioners as it enhances the theoretical understanding of model behavior, potentially guiding the design and optimization of more effective neural architectures.
neural architecturesgradient descentconservation laws