NVIDIA Nemotron 3 Ultra Powers Faster, More Efficient Reasoning for Long-Running Agents

2026-06-04 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

NVIDIA has released Nemotron 3 Ultra, an open 550B-parameter Mixture-of-Experts model with 55B active parameters, specifically designed to enhance long-running AI agents by enabling faster task completion and lower operational costs. This model achieves 5x higher throughput compared to other open models in its class and reduces task completion costs by up to 30% on benchmarks like SWE-bench. Nemotron 3 Ultra incorporates architectural innovations such as post-training for agent harnesses, a Hybrid Mamba transformer, NVFP4 precision, LatentMoE, and Multi-token prediction. It also utilizes Multi-Teacher On-Policy Distillation (MOPD) for efficient reasoning improvement across domains. The model builds on a 10T token pre-training foundation, adding 212B new tokens, including 4B synthetic legal data, 35B Wiki-based data, and 173B refreshed GitHub tokens. NVIDIA also launched Nemotron 3.5 Content Safety, a 4B guardrail model, and Nemotron 3.5 ASR for multilingual voice-native agents, both under the permissive OpenMDW-1.1 license.

Key takeaway

For AI Engineers building long-running agentic systems, Nemotron 3 Ultra offers a significant performance and cost advantage. You can achieve 5x higher throughput and up to 30% cost reduction for complex reasoning tasks by integrating this specialized 550B-parameter model. Utilize its open weights, data, and recipes with NeMo libraries to fine-tune for your domain or deploy via NVIDIA NIM for secure, efficient agent orchestration.

Key insights

Nemotron 3 Ultra optimizes long-running AI agents with a specialized Mixture-of-Experts model for faster, more cost-effective reasoning.

Principles

Agent workflows benefit from specialized models for orchestration.
Hybrid architectures can balance context and recall.
Multi-teacher distillation improves domain-specific reasoning.

Method

Multi-Teacher On-Policy Distillation (MOPD) trains models by having a student generate rollouts and receive dense reward signals from multiple specialized teacher models asynchronously and iteratively.

In practice

Use Nemotron 3 Ultra for complex agent orchestration tasks.
Deploy NVFP4 precision checkpoints across NVIDIA GPU architectures.
Fine-tune Nemotron 3 Ultra using LoRA, SFT, or RL via NeMo libraries.

Topics

AI Agents
Nemotron 3 Ultra
Mixture-of-Experts
Multi-Teacher Distillation
NVFP4 Quantization
Agent Orchestration
Content Safety AI

Code references

Best for: AI Architect, MLOps Engineer, NLP Engineer, AI Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.