Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning

2026-03-11 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Advanced, medium

Summary

NVIDIA has released Nemotron 3 Super, a 120B total, 12B active-parameter model designed for complex multi-agent AI applications like software development and cybersecurity. This model addresses "context explosion" with a native 1M-token context window and mitigates the "thinking tax" through a hybrid Mixture-of-Experts (MoE) architecture, delivering over 5x throughput compared to its predecessor. Key innovations include Latent MoE for increased expert consultation at the same cost, Multi-token Prediction (MTP) for faster long sequence generation and stronger reasoning, a Hybrid Mamba-Transformer backbone for efficiency and precision, and native NVFP4 pretraining optimized for NVIDIA Blackwell, which cuts memory requirements and speeds up inference by 4x on NVIDIA B200. Nemotron 3 Super achieved an 85.6% score on PinchBench, an agentic LLM benchmark, making it a leading open model in its class.

Key takeaway

For AI Architects designing multi-agent systems, Nemotron 3 Super offers a robust, open-source foundation to overcome "context explosion" and "thinking tax." Its hybrid MoE and native NVFP4 pretraining enable efficient, high-accuracy reasoning for complex tasks. You should explore its open weights, datasets, and deployment cookbooks to integrate it into your infrastructure, especially for applications requiring long-term memory and specialized problem-solving.

Key insights

Nemotron 3 Super optimizes multi-agent AI with a hybrid MoE architecture, 1M-token context, and NVFP4 pretraining.

Principles

Hybrid architectures balance efficiency and precision.
Native low-precision training preserves accuracy.
Multi-token prediction improves reasoning and speed.

Method

Nemotron 3 Super is trained in three phases: NVFP4 pretraining on 25 trillion tokens, supervised fine-tuning on 7 million samples, and multi-environment reinforcement learning across 21 configurations using NeMo Gym and NeMo RL.

In practice

Use Nemotron 3 Super for complex multi-agent tasks.
Deploy with NVIDIA NIM for optimized inference.
Fine-tune with LoRA/SFT or GRPO/DAPO cookbooks.

Topics

Agentic AI Systems
Mixture-of-Experts
Hybrid Mamba-Transformer
Multi-token Prediction
NVFP4 Training

Code references

Best for: AI Architect, NLP Engineer, AI Scientist, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.