Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer
Summary
NVIDIA has introduced the Rubin platform, a new architecture designed for "AI factories" that continuously convert power, silicon, and data into intelligence at scale. This platform addresses the evolving demands of AI workloads, which now require processing hundreds of thousands of input tokens for long-context reasoning, complex workflows, and multimodal pipelines, while maintaining real-time inference under strict constraints. The Rubin platform employs "extreme co-design," treating the entire data center, rather than individual GPU servers, as the unit of compute. It integrates six new chips: the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9, BlueField-4 DPU, and Spectrum-6 Ethernet switch, all architected to function as a single, coherent system. This approach aims to deliver sustained performance, lower cost per token, and enhanced reliability, security, and energy efficiency for large-scale AI deployments.
Key takeaway
For CTOs and VPs of Engineering building or expanding AI factories, the NVIDIA Rubin platform offers a blueprint for achieving industrial-scale intelligence production. Your teams should evaluate this co-designed, rack-scale architecture to significantly reduce the number of GPUs needed for 10T MoE model training by up to 75% and achieve up to 10x lower inference cost per token, ensuring predictable performance and operational efficiency in demanding, always-on AI environments.
Key insights
NVIDIA's Rubin platform uses extreme co-design to optimize entire data centers for continuous, scalable AI intelligence production.
Principles
- Treat the data center as the unit of compute.
- Co-design all components for sustained performance.
- Prioritize efficiency across compute, memory, and communication.
Method
The Rubin platform integrates six specialized chips (CPU, GPU, networking, DPU) into a rack-scale architecture, leveraging extreme co-design and liquid cooling to optimize for sustained AI intelligence production, not just peak component performance.
In practice
- Utilize rack-scale systems for improved AI factory efficiency.
- Implement liquid cooling to enhance power efficiency and stability.
- Adopt full-stack confidential computing for secure AI workloads.
Topics
- NVIDIA Rubin Platform
- AI Factories
- Rack-Scale Architecture
- GPU Computing
- Confidential Computing
Code references
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.