VoidPadding: Let [VOID] Handle Padding in Masked Diffusion Language Models so that [EOS] Can Focus on Semantic Termination

2026-06-16 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Expert, quick

Summary

VoidPadding is a novel technique designed for Masked Diffusion Language Models (MDLMs) to address the issue of "[EOS]" token overflow during large-block decoding. Current MDLMs often use the "[EOS]" token for both semantic termination and padding, a dual role that causes problems. VoidPadding resolves this by introducing a dedicated "[VOID]" token for padding, allowing "[EOS]" to solely focus on semantic termination. During inference, the learned "[EOS]" signal facilitates early stopping, while the "[VOID]" signal guides adaptive response canvas expansion. Evaluated on Dream-7B-Instruct, VoidPadding achieved a +17.84 point improvement in the block-size-averaged four-task mean across mathematical reasoning and code generation benchmarks compared to the original model, and a +6.95 point improvement over RainbowPadding, concurrently reducing decoding NFE by 55.7% on average.

Key takeaway

For Machine Learning Engineers developing or fine-tuning Masked Diffusion Language Models, adopting VoidPadding is crucial for mitigating "[EOS]" overflow and enhancing generation quality. You should integrate the "[VOID]" token for padding to allow "[EOS]" to function purely as a semantic terminator, which will enable more reliable early stopping and efficient adaptive response canvas expansion. This approach significantly improves performance on tasks like mathematical reasoning and code generation, reducing decoding NFE by over 55%.

Key insights

Decoupling "[EOS]" and padding roles in MDLMs with "[VOID]" prevents overflow and improves performance.

Principles

Dual-role tokens can lead to performance degradation.
Dedicated tokens improve model clarity and control.
Adaptive canvas expansion enhances decoding efficiency.

Method

VoidPadding introduces a "[VOID]" token for padding and reserves "[EOS]" for semantic termination, enabling early stopping via "[EOS]" and adaptive response canvas expansion via "[VOID]" during inference.

In practice

Implement "[VOID]" for padding in MDLMs.
Use "[EOS]" for early stopping in diffusion models.
Explore adaptive canvas expansion for efficiency.

Topics

Masked Diffusion Language Models
VoidPadding
[EOS] Token
[VOID] Token
Large-Block Decoding
Semantic Termination
Code Generation Benchmarks

Code references

Haru-LCY/VoidPadding

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.