H100 to B200: What Actually Changed

· Source: Machine Learning on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Advanced, long

Summary

NVIDIA's Blackwell architecture, featuring the B200 GPU and GB200 NVL72 system, represents a fundamental shift from the H100, moving beyond individual GPU performance to datacenter-scale AI infrastructure. The B200 GPU itself boasts a dual-die design with 208 billion transistors, 8 TB/s memory bandwidth, 192 GB HBM3e, and new FP4 support, delivering 2.1x-3.4x per-GPU throughput gains over H200 in MLPerf v5.0 benchmarks. The GB200 NVL72 integrates 72 B200 GPUs with 36 Grace CPUs via NVLink 5 and NVSwitch into a single liquid-cooled rack, creating a unified compute fabric with 13.5 TB of coherent memory and 130 TB/s internal bandwidth. This architecture enables up to 30x faster inference for trillion-parameter LLMs compared to H100 systems by treating 72 GPUs as one, facilitated by a unified memory address space between Grace CPUs and B200 GPUs. The significant power increase to 1,000 W per B200 GPU and 120 kW per NVL72 rack mandates liquid cooling, driving a rapid infrastructure transition in datacenters.

Key takeaway

NVIDIA's Blackwell (B200) architecture redefines AI infrastructure, integrating dual-die GPUs with 192GB HBM3e and FP4 support for datacenter-scale unification. Per-GPU, B200 offers 2.25x sparse FP8 TFLOPS and 2.4x memory bandwidth over H100, while the GB200 NVL72 system unifies 72 B200s and Grace CPUs via NVLink 5 into a 13.5 TB coherent memory fabric. This enables 30x faster trillion-parameter LLM inference by eliminating CPU-GPU bottlenecks, though its 120 kW power draw per rack mandates liquid cooling, driving a critical datacenter infrastructure transition.

Topics

Best for: CTO, Director of AI/ML, Investor, AI Architect, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.