ai-digest.dev
last updated 4 h ago
TrainingarXiv cs.AI 10 d ago

SPRI: SVD-Partitioned Residual Initialization for Data-Constrained MoE Upcycling

The article presents SVD-Partitioned Residual Initialization (SPRI), a novel method for upcycling pretrained dense models into sparse Mixture-of-Experts (MoE) models, particularly under data-constrained conditions. SPRI utilizes SVD-partitioned residuals from pretrained feed-forward network weights to enhance expert diversity while maintaining pretrained weight structure, coupled with a two-stage training strategy for improved adaptation stability. Evaluated on multilingual speech-to-text translation using the CoVoST2 dataset, SPRI achieved significant performance gains, improving BLEU and COMET scores over fully fine-tuned dense models and surpassing previous MoE upcycling methods.

moeupcyclingtrainingrelevance 0.00 · engagement 0.00
Read at source ↗← all news
SPRI: SVD-Partitioned Residual Initialization for Data-Constrained MoE Upcycling — AI News Digest