Nvidia’s Open Salvo, OpenAI’s Amazon Deal, Grok Cuts Video Prices, Recursive Language Models
Summary
Nvidia has released Nemotron 3 Super 120B-A12B, an open-source large language model optimized for agentic applications, which includes weights, training datasets, and recipes. This model, part of a planned family, features a hybrid mamba-2/transformer/mixture-of-experts architecture with 120 billion parameters (12 billion active per token) and supports up to 1 million tokens for both input and output. Trained on 25 trillion tokens across 20 natural and 43 programming languages, it offers tool calling, structured outputs, and multiple reasoning modes. Nemotron 3 Super achieves 442 output tokens per second, making it the fastest open-weights model in its size class, and leads on the PinchBench test for agentic tasks. It is available for free download for commercial and noncommercial use, with an API priced around $0.30/$0.80 per 1 million input/output tokens.
Key takeaway
AI Architects and NLP Engineers building agentic applications should evaluate Nemotron 3 Super 120B-A12B. Its leading speed of 442 tokens/second and strong performance on agentic benchmarks like PinchBench, combined with its open-source availability and competitive API pricing, make it a compelling option for developing efficient, high-performance AI agents. Consider integrating this model to benefit from its optimized architecture and extensive training data for your next project.
Key insights
Nvidia's Nemotron 3 Super offers a fast, open-source LLM for agents, leveraging hybrid architecture and co-designed hardware-software optimization.
Principles
- Hybrid architectures can optimize for both speed and long-range context.
- Co-designing hardware and software enhances model performance.
- Open-source models can drive adoption and ecosystem growth.
Method
Nemotron 3 Super uses a hybrid architecture combining mamba-2, attention, and LatentMoE layers with multi-token prediction heads. It was pretrained in NVFP4 for reduced precision and fine-tuned with PivotRL on diverse sequences for agentic tasks.
In practice
- Utilize Nemotron 3 Super for agentic applications requiring high speed.
- Explore its multi-token prediction for faster inference.
- Leverage the open weights and datasets for custom development.
Topics
- AI Regulation
- NVIDIA Nemotron 3 Super
- OpenAI AWS Partnership
- AI Agents
- xAI Grok Imagine 1.0
Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Batch | DeepLearning.AI.