Introduction to profiling tools for AMD hardware
Summary
AMD provides a suite of profiling tools designed to help developers optimize applications for AMD hardware, including "Zen" Core CPUs, RDNA™ GPUs, and CDNA™ accelerators. The article introduces key tools like rocprofiler-sdk, rocprofv3, rocprof-sys, rocprof-compute, Radeon™ GPU Profiler, and AMD uProf, detailing their specific capabilities and supported architectures/operating systems. It emphasizes that efficient application performance requires more than just benchmarking execution time; it necessitates understanding where a program spends its time and identifying bottlenecks. The post outlines a decision-making framework based on profiling objectives: identifying hot spots, assessing hardware utilization (e.g., through Roofline Analysis), and understanding the root causes of observed performance via hardware metrics. Several superseded tools are also noted, with their modern replacements highlighted.
Key takeaway
For AI Engineers and HPC developers optimizing applications on AMD hardware, you should select your profiling tool based on your specific objective and target architecture. Start by identifying performance hot spots with timeline tracing tools like rocprof-sys or rocprofv3. Then, use tools such as rocprof-compute for detailed kernel analysis or AMD uProf for broader system insights to diagnose the root causes of performance bottlenecks, ensuring efficient hardware utilization.
Key insights
AMD offers diverse profiling tools to optimize application performance across its CPU and GPU architectures.
Principles
- Benchmarking requires deep performance tuning.
- Profiling tools vary by architecture and OS.
- Identify hot spots before optimizing.
Method
To profile, first identify hot spots with timeline traces (rocprof-sys, rocprofv3). Then, assess hardware utilization via Roofline Analysis (rocprof-compute, AMD uProf). Finally, collect hardware metrics to understand performance causes (rocprofv3, rocprof-sys, rocprof-compute, AMD uProf).
In practice
- Use rocprofv3 for GPU hot spot analysis.
- Employ rocprof-sys for unified CPU/GPU traces.
- Apply rocprof-compute for Instinct™ GPU kernel analysis.
Topics
- ROCm Profiling
- AMD CDNA Architecture
- GPU Performance Analysis
- Roofline Analysis
- rocprof-sys
Code references
Best for: Machine Learning Engineer, AI Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.