Taking on CUDA With ROCm: ‘One Step After Another’
Summary
AMD's VP AI Software, Anush Elangovan, details the significant advancements and strategic direction of ROCm, AMD's AI software stack, as it challenges Nvidia's CUDA in the data center GPU market. Following the acquisition of Nod.ai two and a half years ago, ROCm has received consistent investment, transforming from a collection of disparate parts into a unified, robust platform. The team aims for a six-week release cadence, emphasizing a "just works" user experience akin to Google Chrome. Key developments include OneROCm, which unifies AI stacks across AMD's CPUs, GPUs, and FPGAs, enhancing portability. AMD has also heavily invested in OpenAI's Triton framework and MLIR, enabling cross-GPU compatibility and reducing the need for direct CUDA-to-HIP kernel conversions. ROCm is 100% open source (excluding firmware) and actively fosters a developer community through direct engagement, including Elangovan personally addressing complaints on X (Twitter) and resolving over 1,000 GitHub issues related to older hardware support.
Key takeaway
For AI/ML engineering leaders evaluating GPU infrastructure, AMD's ROCm now presents a significantly more viable and mature alternative to Nvidia's CUDA. Your teams can benefit from its unified stack, open-source nature, and strong portability via Triton and MLIR, reducing vendor lock-in and potentially optimizing costs. Consider piloting ROCm on AMD Instinct MI355X or forthcoming MI450 hardware, especially for LLM inference workloads where token-per-second throughput is critical, and engage with AMD's responsive developer community for support.
Key insights
AMD's ROCm software stack is rapidly maturing, leveraging open-source tools and community engagement to challenge Nvidia's CUDA dominance.
Principles
- Unify AI stacks for hardware portability.
- Embrace open source for community-driven innovation.
- Prioritize developer experience and direct feedback.
Method
AMD's ROCm development focuses on a unified stack (OneROCm), heavy investment in Triton and MLIR for cross-platform compatibility, and a rapid six-week release cadence, supported by direct developer outreach and issue resolution.
In practice
- Utilize Triton for GPU-agnostic kernel development.
- Leverage vLLM or SGLang for LLM inference.
- Engage with ROCm's open-source community for support.
Topics
- ROCm
- CUDA
- AI Software Stack
- GPU Portability
- Triton Framework
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Hardware Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Big Data & AI News - EE Times.