DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs
Summary
Diffusion large language models (dLLMs) offer parallel decoding for text generation, but their widely used fixed, predefined block scheduling is suboptimal, leading to premature commitments and delayed easy positions. This work introduces Dynamic Sliding Block (DSB), a training-free method that uses a sliding block with dynamic size to adapt to semantic difficulty. To further enhance efficiency, DSB Cache is proposed, a tailored KV-cache mechanism. Experiments on LLaDA-8B-Instruct, LLaDA-1.5, Dream-v0-Base-7B, and Dream-v0-Instruct-7B across benchmarks like GSM8K and HumanEval demonstrate that DSB and DSB Cache consistently improve both generation quality and inference efficiency, achieving up to 81.96 accuracy and 99.61 TPS on LLaDA variants.
Key takeaway
For Machine Learning Engineers optimizing dLLM inference, adopting Dynamic Sliding Block (DSB) and DSB Cache is crucial. This training-free approach dynamically adapts block sizes and manages KV-cache instability, consistently improving generation quality and inference speed. You should evaluate DSB (const.) or DSB (greedy) variants, adjusting parameters like S_init and S_max to balance throughput and accuracy for your specific models and benchmarks.
Key insights
Dynamic block scheduling and tailored KV-caching significantly enhance dLLM generation quality and inference efficiency.
Principles
- Fixed block schedules degrade dLLM quality and efficiency.
- Dynamic block adaptation improves semantic coherence.
- KV-cache must adapt to dynamic block movement.
Method
DSB uses a sliding block with dynamic size, updating boundaries based on unmasking state. DSB Cache employs a prefix window and periodic global refreshes to stabilize KV states under block movement.
In practice
- Implement DSB for dLLM inference to improve quality.
- Integrate DSB Cache to optimize throughput.
- Adjust S_init and S_max for specific trade-offs.
Topics
- Diffusion LLMs
- Block Scheduling
- KV Caching
- Inference Optimization
- Text Generation
- LLaDA Models
- Dream Models
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.