Training
LoRA-Muon: Spectral Steepest Descent on the Low-Rank Manifold
LoRA-Muon introduces a novel optimization approach for Low-Rank Adaptation (LoRA) by applying the spectral steepest-descent rule from the Muon optimizer, enhancing finetuning efficiency for deep learning models. It features improved learning rate transferability across various model dimensions and a compute-efficient design that avoids QR-decomposition and second moment storage, making it suitable for accelerator environments. In evaluations, a rank-32 LoRA-Muon configuration demonstrated lower mean validation loss compared to dense training baselines, highlighting its practical advantages for practitioners in optimizing model performance while reducing resource consumption.
low-rank adaptationfine-tuning