Nemotron 3 Ultra: Open, Efficient Mixture-of-Experts Hybrid Mamba-Transformer Model for Agentic Reasoning

2026-06-12 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

Nemotron 3 Ultra is a new 550 billion total and 55 billion active parameter Mixture-of-Experts Hybrid Mamba-Attention language model. Pre-trained on 20 trillion text tokens, its context length was extended to 1M tokens, followed by post-training using Supervised Fine Tuning (SFT), Reinforcement Learning (RL), and Multi-teacher On-Policy Distillation (MOPD). This model incorporates key technologies such as LatentMoE, Multi Token Prediction (MTP), NVFP4 pre-training, multi-environment RLVR, MOPD, and reasoning budget control. Nemotron 3 Ultra achieves up to ~6x higher inference throughput compared to publicly available LLMs while maintaining on-par accuracy. Its high accuracy, throughput, and 1M token context length make it suitable for long-running autonomous agentic tasks. The base, post-trained, and quantized checkpoints, along with training data and recipe, are open-sourced on HuggingFace.

Key takeaway

For AI Engineers developing autonomous agents or deploying large language models, Nemotron 3 Ultra offers a compelling open-source option. Its ~6x higher inference throughput and 1M token context length directly address performance and context limitations in agentic tasks. You should evaluate its base, post-trained, or quantized checkpoints from HuggingFace to accelerate your agent development and deployment.

Key insights

Nemotron 3 Ultra is an open-source hybrid MoE Mamba-Transformer achieving high throughput and accuracy for agentic AI.

Principles

Hybrid Mamba-Attention models combine strengths.
MoE architectures enhance efficiency and scale.
Multi-stage training improves model capabilities.

Method

Pre-training on 20 trillion tokens, extending context to 1M, then post-training via SFT, RL, and Multi-teacher On-Policy Distillation (MOPD). Key tech includes LatentMoE, MTP, NVFP4, RLVR, and reasoning budget control.

In practice

Utilize Nemotron 3 Ultra for agentic workflows.
Explore open-sourced checkpoints on HuggingFace.
Implement NVFP4 for efficient pre-training.

Topics

Nemotron 3 Ultra
Mixture-of-Experts
Mamba-Transformer
Agentic AI
LLM Inference
Open-Source Models

Best for: AI Architect, MLOps Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.