LESS Is More: Mutual-Stability Sampling for Diffusion Language Models

2026-06-15 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

The «LESS» (Mutual-Stability Sampling) method offers a training-free, model-agnostic adaptive sampler for diffusion large language models (dLLMs), addressing the inefficiency of fixed-budget decoding. dLLMs, which iteratively refine masked sequences for parallel token updates and bidirectional conditioning, typically spend computation on already-stable positions. LESS treats token commitment as an online stopping problem, employing a joint stability rule: a masked position is unmasked only when its top-1 prediction has high confidence, its top-1 token persists across recent reverse steps, and its predictive distribution is stable under top-K inter-step Jensen–Shannon divergence. Evaluated on Dream-7B, LLaDA-8B, and LLaDA-1.5-8B across seven benchmarks (general knowledge, math, code), LESS improves average accuracy while using 72.1% fewer reverse steps than fixed-budget decoding, translating to fewer Transformer forward passes, lower wall-clock latency, and reduced estimated inference compute.

Key takeaway

For Machine Learning Engineers optimizing diffusion LLM inference, «LESS» presents a compelling solution to reduce computational overhead. By adaptively committing tokens based on a mutual-stability rule, you can achieve significant savings—72.1% fewer reverse steps—without retraining models. Consider integrating this training-free, model-agnostic sampler to lower wall-clock latency and inference compute for your dLLM deployments, especially for models like Dream-7B or LLaDA-8B.

Key insights

Adaptive sampling based on token stability significantly enhances diffusion LLM inference efficiency.

Principles

Fixed-budget decoding wastes computation on stable tokens.
Token commitment benefits from multi-criteria stability checks.
Adaptive sampling can optimize dLLM efficiency.

Method

LESS implements mutual-stability sampling via a joint rule: top-1 prediction confidence, top-1 token persistence across reverse steps, and predictive distribution stability under top-K inter-step Jensen–Shannon divergence.

In practice

Integrate adaptive sampling for dLLM inference.
Monitor top-1 prediction confidence for early token commitment.
Track token persistence across reverse steps.

Topics

Diffusion Language Models
Adaptive Sampling
Inference Optimization
Token Stability
Computational Efficiency
Large Language Models

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.