Now in Foundry: NVIDIA Nemotron-3-Super-120B-A12B, IBM Granite-4.0-1b-Speech, and Sarvam-105B
Summary
Microsoft Foundry's Hugging Face collection now includes three new models: NVIDIA Nemotron-3-Super-120B-A12B, IBM Granite-4.0-1b-Speech, and Sarvam-105B. NVIDIA's Nemotron-3-Super-120B-A12B is a hybrid Latent Mixture-of-Experts (MoE) model with 12B active parameters, supporting up to 1 million tokens and featuring configurable reasoning and native speculative decoding. IBM Granite-4.0-1b-Speech is a compact ~1B parameter ASR/AST model achieving a 5.52% average Word Error Rate (WER) at 280× real-time speed, with runtime keyword biasing and bidirectional translation for six languages. Sarvam-105B is a 105B MoE model with 10.3B active parameters, optimized for 22 Indian languages and English, demonstrating strong agentic performance on web search and task-planning benchmarks.
Key takeaway
For AI Architects evaluating large language models for specialized applications, these new additions to Microsoft Foundry offer distinct advantages. Consider NVIDIA Nemotron-3-Super-120B-A12B for ultra-long context and agentic workflows requiring configurable reasoning. IBM Granite-4.0-1b-Speech is ideal for compact, high-speed multilingual ASR/AST with dynamic domain adaptation. Sarvam-105B provides robust agentic capabilities and broad Indian language support, crucial for diverse global deployments.
Key insights
New models in Microsoft Foundry offer specialized capabilities for long-context, multilingual speech, and agentic tasks.
Principles
- Hybrid MoE architectures improve accuracy per parameter.
- Runtime keyword biasing enables domain adaptation without fine-tuning.
- Multi-token prediction reduces time-to-first-token.
Method
NVIDIA's Latent MoE architecture combines Mamba-2 state-space layers and sparse MoE layers with full attention, routing tokens to a smaller latent space for computation.
In practice
- Use Nemotron-3-Super for 1M-token RAG and code analysis.
- Employ Granite-4.0-1b-Speech for domain-specific ASR via keyword biasing.
- Leverage Sarvam-105B for agentic workflows in 22 Indian languages.
Topics
- NVIDIA Nemotron-3-Super-120B-A12B
- IBM Granite-4.0-1b-Speech
- Sarvam-105B
- Mixture-of-Experts
- Automatic Speech Recognition
Best for: AI Architect, AI Engineer, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Microsoft Foundry Blog articles.