Nvidia’s Open Salvo, OpenAI’s Amazon Deal, Grok Cuts Video Prices, Recursive Language Models

2026-03-27 · Source: The Batch | DeepLearning.AI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

Nvidia has released Nemotron 3 Super 120B-A12B, an open-source large language model optimized for agentic applications, which includes weights, training datasets, and recipes. This model, part of a planned family, features a hybrid mamba-2/transformer/mixture-of-experts architecture with 120 billion parameters (12 billion active per token) and supports up to 1 million tokens for both input and output. Trained on 25 trillion tokens across 20 natural and 43 programming languages, it offers tool calling, structured outputs, and multiple reasoning modes. Nemotron 3 Super achieves 442 output tokens per second, making it the fastest open-weights model in its size class, and leads on the PinchBench test for agentic tasks. It is available for free download for commercial and noncommercial use, with an API priced around $0.30/$0.80 per 1 million input/output tokens.

Key takeaway

AI Architects and NLP Engineers building agentic applications should evaluate Nemotron 3 Super 120B-A12B. Its leading speed of 442 tokens/second and strong performance on agentic benchmarks like PinchBench, combined with its open-source availability and competitive API pricing, make it a compelling option for developing efficient, high-performance AI agents. Consider integrating this model to benefit from its optimized architecture and extensive training data for your next project.

Key insights

Nvidia's Nemotron 3 Super offers a fast, open-source LLM for agents, leveraging hybrid architecture and co-designed hardware-software optimization.

Principles

Hybrid architectures can optimize for both speed and long-range context.
Co-designing hardware and software enhances model performance.
Open-source models can drive adoption and ecosystem growth.

Method

Nemotron 3 Super uses a hybrid architecture combining mamba-2, attention, and LatentMoE layers with multi-token prediction heads. It was pretrained in NVFP4 for reduced precision and fine-tuned with PivotRL on diverse sequences for agentic tasks.

In practice

Utilize Nemotron 3 Super for agentic applications requiring high speed.
Explore its multi-token prediction for faster inference.
Leverage the open weights and datasets for custom development.

Topics

AI Regulation
NVIDIA Nemotron 3 Super
OpenAI AWS Partnership
AI Agents
xAI Grok Imagine 1.0

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Batch | DeepLearning.AI.