NVIDIA NVbandwidth: Your Essential Tool for Measuring GPU Interconnect and Memory Performance
Summary
NVIDIA NVbandwidth is a CUDA-based tool designed to measure memory bandwidth and latency across various memory copy patterns in single-GPU, multi-GPU, and multi-node NVIDIA systems. It utilizes either copy engine (CE) or kernel copy methods to report current measured bandwidth, offering insights into data transfer performance between CPU memory and GPU memory, and GPU memory to GPU memory. The tool supports comprehensive unidirectional, bidirectional, multi-GPU, and multi-node tests, along with latency testing. NVbandwidth is topology-agnostic, working across NVLINK, NVLink C2C, or PCIe interconnects, and provides flexible output options including plain text and JSON. It requires a CUDA-enabled NVIDIA GPU, CUDA toolkit (version 11.X+ for single-node, 12.3+ for multi-node), compatible NVIDIA display driver, C++17 compiler, CMake 3.20+, and Boost program options library.
Key takeaway
For ML infrastructure engineers and system architects evaluating or optimizing GPU deployments, NVbandwidth provides critical metrics for data transfer performance. You should integrate NVbandwidth into your validation workflows to benchmark new hardware, identify bottlenecks in existing systems, and perform regression testing after software or driver updates. This ensures your CUDA applications achieve optimal data movement and overall system efficiency.
Key insights
NVbandwidth measures GPU memory transfer performance to optimize CUDA applications and validate system configurations.
Principles
- Memory bandwidth is critical for GPU application performance.
- Data transfer patterns impact overall system efficiency.
Method
NVbandwidth measures performance by enqueuing a spin kernel, then a start event, multiple memcpy iterations, and a stop event, ensuring overhead exclusion.
In practice
- Use NVbandwidth to diagnose CUDA application bandwidth bottlenecks.
- Compare bandwidth across GPUs to evaluate system upgrades.
- Run regression tests to detect performance changes after updates.
Topics
- NVIDIA NVbandwidth
- GPU Performance Optimization
- Memory Bandwidth Measurement
- Multi-GPU Systems
- Multi-Node GPU Deployments
Code references
Best for: Machine Learning Engineer, AI Architect, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.