Building and Deploying Custom hipBLASLt Libraries on AMD Instinct GPUs

2026-06-18 · Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, long

Summary

A comprehensive workflow for building and deploying custom "hipBLASLt" libraries on AMD Instinct GPUs is outlined, crucial for optimizing General Matrix Multiply (GEMM) operations in generative AI workloads. It addresses scenarios where standard ROCm binaries in "/opt/rocm" are insufficient, such as testing bug fixes, validating TensileLite-tuned kernels, or evaluating new architectures like the AMD Instinct MI300X ("gfx942"). The process involves preparing Ubuntu/Debian or RHEL/RPM build environments, fetching the "rocm-libraries" source at a specific commit ("1784d40186"), and compiling "hipBLASLt" for target architectures using "install.sh -a". Deployment strategies include generating ".deb" or ".rpm" packages for cluster-wide distribution or utilizing "LD_LIBRARY_PATH" for isolated, root-privilege-free testing, ensuring operational flexibility and avoiding dependency conflicts.

Key takeaway

For MLOps Engineers deploying generative AI workloads on AMD Instinct GPUs, especially those requiring specific "hipBLASLt" versions or custom TensileLite kernels, you should adopt this workflow. Building architecture-specific packages or using "LD_LIBRARY_PATH" for runtime selection provides granular control over GEMM performance and avoids system-wide dependency conflicts. This ensures your deployments are reproducible, traceable, and optimized for target hardware like the MI300X ("gfx942"), significantly improving end-to-end latency and throughput.

Key insights

Custom "hipBLASLt" builds offer precise GPU optimization and flexible deployment for AMD Instinct GPUs, avoiding system-wide conflicts.

Principles

Avoid "sudo make install" in multi-user environments.
Target specific GPU architectures to optimize builds.
Document source revision and build logs for traceability.

Method

Prepare OS environment, clone "rocm-libraries" at commit "1784d40186", build "hipBLASLt" for specific architectures like "gfx942" using "install.sh -a", then package as ".deb"/".rpm" or use "LD_LIBRARY_PATH" for runtime selection.

In practice

Use "LD_LIBRARY_PATH" for local testing.
Generate ".deb"/".rpm" for cluster deployment.
Verify loaded library with "ldd" and "hipblaslt-bench".

Topics

hipBLASLt
AMD Instinct GPUs
GPU Optimization
Generative AI
ROCm
Software Deployment

Code references

Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.