Introducing Nemotron 3 Super: An Open Hybrid Mamba-Transformer MoE for Agentic Reasoning
Summary
NVIDIA has released Nemotron 3 Super, a 120B total, 12B active-parameter model designed for complex multi-agent AI applications like software development and cybersecurity. This model addresses "context explosion" with a native 1M-token context window and mitigates the "thinking tax" through a hybrid Mixture-of-Experts (MoE) architecture, delivering over 5x throughput compared to its predecessor. Key innovations include Latent MoE for increased expert consultation at the same cost, Multi-token Prediction (MTP) for faster long sequence generation and stronger reasoning, a Hybrid Mamba-Transformer backbone for efficiency and precision, and native NVFP4 pretraining optimized for NVIDIA Blackwell, which cuts memory requirements and speeds up inference by 4x on NVIDIA B200. Nemotron 3 Super achieved an 85.6% score on PinchBench, an agentic LLM benchmark, making it a leading open model in its class.
Key takeaway
For AI Architects designing multi-agent systems, Nemotron 3 Super offers a robust, open-source foundation to overcome "context explosion" and "thinking tax." Its hybrid MoE and native NVFP4 pretraining enable efficient, high-accuracy reasoning for complex tasks. You should explore its open weights, datasets, and deployment cookbooks to integrate it into your infrastructure, especially for applications requiring long-term memory and specialized problem-solving.
Key insights
Nemotron 3 Super optimizes multi-agent AI with a hybrid MoE architecture, 1M-token context, and NVFP4 pretraining.
Principles
- Hybrid architectures balance efficiency and precision.
- Native low-precision training preserves accuracy.
- Multi-token prediction improves reasoning and speed.
Method
Nemotron 3 Super is trained in three phases: NVFP4 pretraining on 25 trillion tokens, supervised fine-tuning on 7 million samples, and multi-environment reinforcement learning across 21 configurations using NeMo Gym and NeMo RL.
In practice
- Use Nemotron 3 Super for complex multi-agent tasks.
- Deploy with NVIDIA NIM for optimized inference.
- Fine-tune with LoRA/SFT or GRPO/DAPO cookbooks.
Topics
- Agentic AI Systems
- Mixture-of-Experts
- Hybrid Mamba-Transformer
- Multi-token Prediction
- NVFP4 Training
Code references
Best for: AI Architect, NLP Engineer, AI Scientist, AI Engineer, Machine Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.