DSB: Dynamic Sliding Block Scheduling for Diffusion LLMs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, extended

Summary

Diffusion large language models (dLLMs) offer parallel decoding for text generation, but their widely used fixed, predefined block scheduling is suboptimal, leading to premature commitments and delayed easy positions. This work introduces Dynamic Sliding Block (DSB), a training-free method that uses a sliding block with dynamic size to adapt to semantic difficulty. To further enhance efficiency, DSB Cache is proposed, a tailored KV-cache mechanism. Experiments on LLaDA-8B-Instruct, LLaDA-1.5, Dream-v0-Base-7B, and Dream-v0-Instruct-7B across benchmarks like GSM8K and HumanEval demonstrate that DSB and DSB Cache consistently improve both generation quality and inference efficiency, achieving up to 81.96 accuracy and 99.61 TPS on LLaDA variants.

Key takeaway

For Machine Learning Engineers optimizing dLLM inference, adopting Dynamic Sliding Block (DSB) and DSB Cache is crucial. This training-free approach dynamically adapts block sizes and manages KV-cache instability, consistently improving generation quality and inference speed. You should evaluate DSB (const.) or DSB (greedy) variants, adjusting parameters like S_init and S_max to balance throughput and accuracy for your specific models and benchmarks.

Key insights

Dynamic block scheduling and tailored KV-caching significantly enhance dLLM generation quality and inference efficiency.

Principles

Method

DSB uses a sliding block with dynamic size, updating boundaries based on unmasking state. DSB Cache employs a prefix window and periodic global refreshes to stabilize KV states under block movement.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.