ai-digest.dev
last updated 1 h ago
ModelsarXiv cs.AI 9 d ago

Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

The Nemotron 3 Ultra model has been released, featuring a total of 550 billion parameters, with 55 billion active parameters, and employing a Mixture-of-Experts Hybrid Mamba-Attention architecture. It was pre-trained on 20 trillion tokens and supports a context length of up to 1 million tokens, achieving approximately 6x higher inference throughput compared to current leading LLMs while maintaining comparable accuracy. This model's advanced capabilities, including LatentMoE and multi-environment reinforcement learning, make it particularly suitable for long-duration autonomous reasoning tasks, and its open-source availability on HuggingFace provides valuable resources for practitioners in the AI field.

mixture-of-expertslanguage modelagentic reasoningrelevance 0.00 · engagement 0.00
Read at source ↗← all news
Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning — AI News Digest