OpenXLA and JAX - ROCm Support and the State of CI
Summary
The OpenXLA compiler stack, comprising XLA and JAX, now fully supports AMD ROCm upstream, enabling JAX programs to compile and run on AMD Instinct GPUs. This integration is backed by over 300 commits in the past year, introducing key features like Triton on AMDGPU for fused matmul-plus-epilogue patterns, hipBLASLt group-GEMM, FP8 fast accumulation on ROCm 7, and rocPRIM integration. The development includes a hermetic LLVM toolchain for XLA builds and "manylinux_2_28" wheels for JAX, enhancing reproducibility and distribution. A robust, unified CI pipeline ensures every pull request for XLA and JAX is validated on real AMD Instinct MI300 and MI350 silicon, with cross-repository pre-flighting to maintain compatibility. This makes `pip install "jax[rocm7-local]"` a first-class entry point for users.
Key takeaway
For AI Engineers and researchers evaluating compiler stacks for AMD Instinct MI300/MI350 capacity, you should consider OpenXLA and JAX as a production-ready option. The robust upstream ROCm support, including Triton on AMDGPU and FP8 capabilities, ensures strong performance for transformer workloads. You can easily integrate it via `pip install "jax[rocm7-local]"` and contribute to its ongoing development by reporting performance or numerical issues with HLO dumps.
Key insights
OpenXLA and JAX now offer first-class, upstream ROCm support, validated by comprehensive CI on AMD Instinct hardware.
Principles
- Whole-program compilation optimizes HLO graphs.
- SPMD parallelism simplifies multi-device code.
- Upstream CI prevents bit rot.
In practice
- Use `pip install "jax[rocm7-local]"` for ROCm 7.
- Build custom wheels for older `gfx` targets.
- File performance reports with HLO dumps.
Topics
- OpenXLA
- JAX
- AMD ROCm
- CI/CD
- AMD Instinct GPUs
- Triton Compiler
Code references
Best for: MLOps Engineer, Research Scientist, Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.