Inside the NVIDIA Rubin Platform: Six New Chips, One AI Supercomputer

2026-01-05 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Software Development & Engineering · Depth: Advanced, extended

Summary

NVIDIA has introduced the Rubin platform, a new architecture designed for "AI factories" that continuously convert power, silicon, and data into intelligence at scale. This platform addresses the evolving demands of AI workloads, which now require processing hundreds of thousands of input tokens for long-context reasoning, complex workflows, and multimodal pipelines, while maintaining real-time inference under strict constraints. The Rubin platform employs "extreme co-design," treating the entire data center, rather than individual GPU servers, as the unit of compute. It integrates six new chips: the Vera CPU, Rubin GPU, NVLink 6 switch, ConnectX-9, BlueField-4 DPU, and Spectrum-6 Ethernet switch, all architected to function as a single, coherent system. This approach aims to deliver sustained performance, lower cost per token, and enhanced reliability, security, and energy efficiency for large-scale AI deployments.

Key takeaway

For CTOs and VPs of Engineering building or expanding AI factories, the NVIDIA Rubin platform offers a blueprint for achieving industrial-scale intelligence production. Your teams should evaluate this co-designed, rack-scale architecture to significantly reduce the number of GPUs needed for 10T MoE model training by up to 75% and achieve up to 10x lower inference cost per token, ensuring predictable performance and operational efficiency in demanding, always-on AI environments.

Key insights

NVIDIA's Rubin platform uses extreme co-design to optimize entire data centers for continuous, scalable AI intelligence production.

Principles

Treat the data center as the unit of compute.
Co-design all components for sustained performance.
Prioritize efficiency across compute, memory, and communication.

Method

The Rubin platform integrates six specialized chips (CPU, GPU, networking, DPU) into a rack-scale architecture, leveraging extreme co-design and liquid cooling to optimize for sustained AI intelligence production, not just peak component performance.

In practice

Utilize rack-scale systems for improved AI factory efficiency.
Implement liquid cooling to enhance power efficiency and stability.
Adopt full-stack confidential computing for secure AI workloads.

Topics

NVIDIA Rubin Platform
AI Factories
Rack-Scale Architecture
GPU Computing
Confidential Computing

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.