Out-of-the-Box ROLL Support on AMD GPUs: Accelerating Reinforcement Learning at Scale
Summary
Alibaba's open-source ROLL reinforcement learning framework now offers out-of-the-box support on AMD Instinct™ GPUs utilizing AMD ROCm™ software. Designed for large-scale, distributed RL, ROLL accelerates training for Large Language Models, powering reasoning and agentic behaviors. AMD collaborated with the ROLL development team, contributing upstream enhancements to ensure native compatibility without requiring code changes or custom builds. Key improvements include full vLLM sleep mode compatibility for versions ≥ 0.11.0, support for both vLLM Engine v0 and v1, and Ray compatibility fixes for versions ≥ 2.48 to correctly handle HIP_VISIBLE_DEVICES. ROLL features asynchronous execution for improved GPU utilization and agentic training for multi-agent workflows. Users are encouraged to leverage the official Docker image rlsys/roll_opensource for a seamless setup, which includes all necessary ROCm dependencies, Ray patches, vLLM engine support, and performance optimizations.
Key takeaway
For Machine Learning Engineers scaling reinforcement learning workloads for Large Language Models, you can now seamlessly deploy Alibaba's ROLL framework on AMD Instinct GPUs. This out-of-the-box compatibility, enabled by AMD's upstream contributions, eliminates complex setup and custom patching. You should leverage the official rlsys/roll_opensource Docker image and adjust gpu_memory_utilization in your training configurations to optimize performance and prevent potential CPU out-of-memory issues, ensuring efficient distributed training.
Key insights
Alibaba's ROLL framework now natively supports AMD Instinct GPUs via ROCm, accelerating large-scale reinforcement learning for LLMs.
Principles
- Distributed RL frameworks enhance LLM capabilities.
- Asynchronous execution improves GPU utilization.
- Upstream contributions ensure OOTB hardware compatibility.
Method
Users can run ROLL on AMD GPUs by building or pulling the official Docker image, preparing data, configuring training parameters like gpu_memory_utilization, and setting environment variables for single or multi-node execution.
In practice
- Utilize rlsys/roll_opensource Docker image for setup.
- Reduce gpu_memory_utilization to mitigate CPU OOM.
- Ensure Ray version ≥ 2.48 for correct GPU binding.
Topics
- Reinforcement Learning
- Large Language Models
- AMD ROCm
- ROLL Framework
- Distributed Training
- vLLM
Code references
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AMD ROCm Blogs.