NVIDIA Vera Rubin POD: Seven Chips, Five Rack-Scale Systems, One AI Supercomputer

2026-03-19 · Source: NVIDIA Technical Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Expert, long

Summary

NVIDIA has introduced the Vera Rubin POD, a set of five specialized rack-scale systems built on the third-generation NVIDIA MGX rack architecture, designed for the growing demands of agentic AI workloads. This platform features 40 racks, 1.2 quadrillion transistors, nearly 20,000 NVIDIA dies, 1,152 NVIDIA Rubin GPUs, 60 exaflops, and 10 PB/s total scale-up bandwidth. The Vera Rubin POD includes the NVL72 compute engine for AI scaling laws, Groq 3 LPX for low-latency inference, Vera CPU racks for agentic AI and reinforcement learning sandboxing, BlueField-4 STX for AI-native context memory storage, and Spectrum-6 SPX networking racks. These systems are co-designed to function as a single AI supercomputer, emphasizing energy efficiency, resiliency, and scalability through innovations like Intelligent Power Smoothing and dynamic Max-Q power provisioning.

Key takeaway

For CTOs and VPs of Engineering planning next-generation AI infrastructure, the NVIDIA Vera Rubin POD offers a blueprint for building highly efficient, scalable, and resilient AI factories. Your teams should evaluate this co-designed, rack-scale approach to optimize for agentic AI workloads, potentially achieving significant gains in performance per watt and reduced token costs compared to prior generations like Blackwell.

Key insights

NVIDIA's Vera Rubin POD integrates five specialized rack systems into a cohesive AI supercomputer for agentic AI.

Principles

Co-design chips, systems, and software for AI factories.
Prioritize energy efficiency from chip to grid.
Modular, cable-free designs enhance reliability and serviceability.

Method

The Vera Rubin POD integrates five distinct rack-scale systems (NVL72, Groq 3 LPX, Vera CPU, BlueField-4 STX, Spectrum-6 SPX) via the MGX rack architecture to form a single AI supercomputer, optimizing for agentic AI workloads.

In practice

Utilize 45°C liquid cooling to reduce PUE.
Implement dynamic Max-Q power provisioning for GPU capacity.
Employ rack-level energy storage to cushion power transients.

Topics

Agentic AI
NVIDIA Vera Rubin POD
NVIDIA MGX Architecture
AI Supercomputing
Data Center Infrastructure

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer, AI Hardware Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.