ModelsHugging Face Blog — 55 d ago▲ 2 · 0 cmts

Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

JetBrains has announced Mellum2, a 12 billion parameter Mixture-of-Experts (MoE) model designed to enhance performance and efficiency in natural language processing tasks. The model utilizes a sparse activation mechanism, allowing only a subset of experts to be activated during inference, which improves computational efficiency while maintaining high accuracy on benchmarks. This release is significant for practitioners as it offers a scalable solution for resource-constrained environments, enabling the deployment of large language models with reduced computational overhead.

jetbrainsmixture-of-expertsmodelrelevance 0.80 · engagement 0.06

Read at source ↗HN discussion ← all news