Automating GPU Kernel Translation with AI Agents: cuTile Python to cuTile.jl
Summary
NVIDIA CUDA Tile (cuTile) is a tile-based programming model for GPU kernels. cuTile.jl extends this model to Julia, enabling custom GPU kernel development without CUDA C++. This allows Julia's scientific computing ecosystem, including differential equations and physics simulations, to access optimized GPU acceleration. A key challenge is translating existing cuTile Python kernels to cuTile.jl due to semantic differences in indexing, broadcasting, memory layout, and loop forms, which can lead to silent data corruption rather than compiler errors. To address this, NVIDIA developed an AI-assisted workflow, packaged as an LLM skill in TileGym, which systematizes the translation process by encoding critical rules, API mappings, and validation steps. This skill facilitates accurate, repeatable cross-domain-specific language GPU kernel translation, demonstrated through matrix multiplication and softmax examples.
Key takeaway
For AI Engineers or Research Scientists porting GPU kernels between domain-specific languages like cuTile Python and cuTile.jl, you should leverage structured AI agent skills. This approach, exemplified by TileGym's conversion skill, captures critical translation rules and pitfalls, significantly reducing manual effort and preventing silent data corruption. Your team can achieve faster, more reliable kernel translations by systematizing the process with validated examples and static checkers.
Key insights
AI-assisted workflows can translate GPU kernels between DSLs by encoding domain-specific rules and pitfalls.
Principles
- Semantic differences between DSLs require explicit translation rules.
- Shared abstractions can mask critical surface-level divergences.
Method
The method involves analyzing source kernels, applying API mappings and critical rules, running static validation, testing against reference implementations, and debugging using a structured guide.
In practice
- Use TileGym's Julia subproject for cuTile.jl kernels.
- Employ the conversion skill for Python-to-Julia kernel translation.
- Ensure Julia 1.12+ and CUDA 13.1+ driver are installed.
Topics
- cuTile Python
- cuTile.jl
- GPU Kernel Translation
- AI Agents
- LLM Skills
Code references
Best for: Machine Learning Engineer, AI Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by NVIDIA Technical Blog.