LESS Is More: Mutual-Stability Sampling for Diffusion Language Models
Summary
The «LESS» (Mutual-Stability Sampling) method offers a training-free, model-agnostic adaptive sampler for diffusion large language models (dLLMs), addressing the inefficiency of fixed-budget decoding. dLLMs, which iteratively refine masked sequences for parallel token updates and bidirectional conditioning, typically spend computation on already-stable positions. LESS treats token commitment as an online stopping problem, employing a joint stability rule: a masked position is unmasked only when its top-1 prediction has high confidence, its top-1 token persists across recent reverse steps, and its predictive distribution is stable under top-K inter-step Jensen–Shannon divergence. Evaluated on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B across seven benchmarks (general knowledge, math, code), LESS improves average accuracy while using 72.1% fewer reverse steps than fixed-budget decoding, translating to fewer Transformer forward passes, lower wall-clock latency, and reduced estimated inference compute.
Key takeaway
For Machine Learning Engineers optimizing diffusion LLM inference, «LESS» presents a compelling solution to reduce computational overhead. By adaptively committing tokens based on a mutual-stability rule, you can achieve significant savings—72.1% fewer reverse steps—without retraining models. Consider integrating this training-free, model-agnostic sampler to lower wall-clock latency and inference compute for your dLLM deployments, especially for models like Dream-7B or LLaDA-8B.
Key insights
Adaptive sampling based on token stability significantly enhances diffusion LLM inference efficiency.
Principles
- Fixed-budget decoding wastes computation on stable tokens.
- Token commitment benefits from multi-criteria stability checks.
- Adaptive sampling can optimize dLLM efficiency.
Method
LESS implements mutual-stability sampling via a joint rule: top-1 prediction confidence, top-1 token persistence across reverse steps, and predictive distribution stability under top-K inter-step Jensen–Shannon divergence.
In practice
- Integrate adaptive sampling for dLLM inference.
- Monitor top-1 prediction confidence for early token commitment.
- Track token persistence across reverse steps.
Topics
- Diffusion Language Models
- Adaptive Sampling
- Inference Optimization
- Token Stability
- Computational Efficiency
- Large Language Models
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.