Self-Generated Error Training for Token Editing in Diffusion Language Models
Summary
A new training approach, self-generated Token-to-token (T2T) editing, addresses a critical training-inference mismatch in LLaDA2.1's block-diffusion decoding. The existing T2T editor, designed to revise committed tokens, is trained on random vocabulary corruptions but faces the model's own fluent, high-confidence draft errors during actual inference. The proposed self-generated T2T method involves a no-gradient draft pass to predict and fill masked positions, followed by a second pass that supervises recovery using these self-generated corruptions. Implemented as a short LoRA continued-pretraining pass on LLaDA2.1-mini, this technique was evaluated on several benchmarks using the official Q-Mode T2T procedure with unchanged inference parameters. The results indicate improved accuracy and reduced T2T edit intensity, effectively mitigating common failure modes like final-digit transcription errors and excessive self-correction in short factual answers.
Key takeaway
For Machine Learning Engineers developing or fine-tuning diffusion language models like LLaDA2.1, consider adopting self-generated error training for token editing. This approach directly addresses the training-inference mismatch by exposing your model to its own high-confidence draft errors, leading to improved accuracy and reduced over-correction. You should implement a short LoRA continued-pretraining pass to efficiently integrate this method, enhancing the model's ability to precisely revise committed tokens and mitigate common failure modes.
Key insights
Training language model editors on self-generated errors improves performance by aligning training with inference conditions.
Principles
- Align training data with inference error distribution.
- Self-correction mechanisms benefit from realistic error exposure.
- LoRA can efficiently adapt models for error recovery.
Method
Perform a no-gradient draft pass to generate predicted tokens for masked positions, then supervise recovery in a second pass using these self-generated corruptions.
In practice
- Implement LoRA for continued pre-training on specific error types.
- Use a two-pass decoding strategy for token editing.
- Evaluate editor performance on model-specific error patterns.
Topics
- Token Editing
- Diffusion Language Models
- LLaDA2.1
- LoRA Fine-tuning
- Training-Inference Mismatch
- Error Correction
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.