Accelerating Disaggregated RL for Visual Generative LLMs with Diffusion-Based Parallelism and Trainer-Assisted Generation
Summary
DigenRL is a novel disaggregated Reinforcement Learning (RL) framework designed to accelerate training for diffusion-based visual generative Large Language Models (LLMs). It addresses the limitations of existing colocated RL systems, such as veRL-Omni, which couple rollout and training resources, hindering heterogeneous deployment and independent scaling. DigenRL introduces flexible resource allocation, support for heterogeneous GPUs, and efficient task scheduling. Key innovations include a generation-axis pipeline (GAP) and time-step parallelism (TSP) for finer-grained pipelining between rollout and training, an elastic trainer-assisted generation (TAG) approach where trainer GPUs dynamically aid rollout generations, and a tightly one-step constrained asynchronous strategy to optimize pipeline utilization. Experiments across three hardware testbeds using 16-32 GPUs and models like HunyuanVideo-13B, Wan2.1-14B, FLUX.1-12B, and QwenImage-20B demonstrate DigenRL's effectiveness, achieving 1.56-2.10x throughput improvements over existing diffusion RL systems, veRL-Omni and GenRL.
Key takeaway
For AI Architects or Machine Learning Engineers optimizing large-scale visual generative LLM training with Reinforcement Learning, DigenRL offers a significant throughput advantage. If your current systems rely on colocated execution, consider adopting DigenRL's disaggregated architecture to achieve 1.56-2.10x faster training. This framework allows you to flexibly allocate resources and utilize heterogeneous GPUs more efficiently, directly impacting your project timelines and computational costs.
Key insights
DigenRL disaggregates RL for visual generative LLMs, using parallelism and trainer assistance to achieve 1.56-2.10x throughput gains.
Principles
- Disaggregate resources for flexible scaling.
- Pipeline rollout and training for efficiency.
- Dynamically reallocate idle GPU resources.
Method
DigenRL employs a generation-axis pipeline (GAP) with time-step parallelism (TSP), elastic trainer-assisted generation (TAG), and a one-step constrained asynchronous strategy to optimize disaggregated RL for diffusion LLMs.
In practice
- Deploy DigenRL for visual LLM training.
- Utilize heterogeneous GPU setups.
- Optimize resource allocation for RL.
Topics
- Reinforcement Learning
- Diffusion Models
- Generative LLMs
- Disaggregated Systems
- GPU Parallelism
- Throughput Optimization
- DigenRL
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.