H100 to B200: What Actually Changed
Summary
NVIDIA's Blackwell architecture, featuring the B200 GPU and GB200 NVL72 system, represents a fundamental shift from the H100, moving beyond individual GPU performance to datacenter-scale AI infrastructure. The B200 GPU itself boasts a dual-die design with 208 billion transistors, 8 TB/s memory bandwidth, 192 GB HBM3e, and new FP4 support, delivering 2.1x-3.4x per-GPU throughput gains over H200 in MLPerf v5.0 benchmarks. The GB200 NVL72 integrates 72 B200 GPUs with 36 Grace CPUs via NVLink 5 and NVSwitch into a single liquid-cooled rack, creating a unified compute fabric with 13.5 TB of coherent memory and 130 TB/s internal bandwidth. This architecture enables up to 30x faster inference for trillion-parameter LLMs compared to H100 systems by treating 72 GPUs as one, facilitated by a unified memory address space between Grace CPUs and B200 GPUs. The significant power increase to 1,000 W per B200 GPU and 120 kW per NVL72 rack mandates liquid cooling, driving a rapid infrastructure transition in datacenters.
Key takeaway
NVIDIA's Blackwell (B200) architecture redefines AI infrastructure, integrating dual-die GPUs with 192GB HBM3e and FP4 support for datacenter-scale unification. Per-GPU, B200 offers 2.25x sparse FP8 TFLOPS and 2.4x memory bandwidth over H100, while the GB200 NVL72 system unifies 72 B200s and Grace CPUs via NVLink 5 into a 13.5 TB coherent memory fabric. This enables 30x faster trillion-parameter LLM inference by eliminating CPU-GPU bottlenecks, though its 120 kW power draw per rack mandates liquid cooling, driving a critical datacenter infrastructure transition.
Topics
- Blackwell Architecture
- NVLink
- Datacenter Infrastructure
- LLM Inference
- HBM3e
Best for: CTO, Director of AI/ML, Investor, AI Architect, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning on Medium.