Training
SDS-LoRA: Overcoming Anisotropic Gradient Scaling in Low-Rank Adaptation
The paper introduces SDS-LoRA, a new low-rank parameterization designed to address the issue of anisotropic gradient scaling in Low-Rank Adaptation (LoRA) when adapting large pre-trained models. By decoupling singular values from the backward pass, SDS-LoRA allows gradients to propagate through orthonormal bases, leading to improved convergence rates that are independent of the condition number of the low-rank matrices. Experimental results indicate that SDS-LoRA enhances adaptation performance across various natural language and vision benchmarks, thereby narrowing the gap to full fine-tuning.
loragradientadaptation