Models
SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs
SoftMoE introduces a soft differentiable routing mechanism for Sparse Mixture-of-Experts (MoE) architectures in large language models, replacing the discrete top-$k$ routing with a truncated soft top-$k$ LapSum relaxation. This innovation allows for gradient-based optimization of expert selection, enabling a flexible allocation of expert capacity across layers while maintaining autoregressive compatibility. Practitioners can leverage SoftMoE to achieve improved performance on language modeling tasks with reduced computational costs, as it activates fewer experts while still meeting a global budget constraint.
moerouting