AMD GPU Programming From Beginner to Expert (Part 1) - TensorDescriptor in Composable Kernel (CK)

· Source: AMD ROCm Blogs · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, long

Summary

This article, part of a series on AMD GPU programming, introduces the Composable Kernel (CK) framework's TensorDescriptor, a fundamental abstraction for managing multi-dimensional data layouts and transformations. It explains how TensorDescriptor uses a tree structure of "Transforms" (like Embed, Unmerge, Merge, and PassThrough) to map logical coordinates to physical memory addresses. The content provides a detailed example of building a 3D tensor from a 2D base using these transforms and includes a C++ code example demonstrating their instantiation and chaining. Furthermore, it presents a complete, optimized GPU kernel implementation for matrix transpose on AMD GPUs, detailing host code, kernel logic, and performance, showing a 44.3% throughput improvement over PyTorch, achieving 5.820 μs compared to 8.4 μs.

Key takeaway

For AI Engineers and Machine Learning Engineers optimizing GPU kernel performance on AMD hardware, understanding CK's TensorDescriptor and its composable Transforms is crucial. You should explore implementing custom data layouts using chained `Unmerge`, `Merge`, and `PassThrough` transforms, and adopt the demonstrated 4x4 per-thread, register-level computation pattern for operations like matrix transpose to achieve significant throughput improvements, as shown by the 44.3% gain over PyTorch.

Key insights

Composable Kernel's TensorDescriptor uses hierarchical transforms to efficiently manage complex multi-dimensional data layouts on AMD GPUs.

Principles

Method

TensorDescriptor defines tensors using a tree of Transforms, each with a `CalculateLowerIndex` method, to map upper-level coordinates to lower-level ones, ultimately resolving to a linear memory offset.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, Software Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.