T3D: Few-Step Diffusion Language Models via Trajectory Self-Distillation with Direct Discriminative Optimization
Summary
T3D introduces a trajectory self-distillation framework designed to enhance the efficiency of Diffusion Large Language Models (DLLMs) for few-step text generation. While DLLMs offer parallel token decoding, their practical inference speed is often limited by the necessity for numerous refinement steps, with aggressive step reduction severely impacting generation quality. T3D addresses this by distilling the model's own generative trajectories and integrating Direct Discriminative Optimization (DDO), a reverse-KL objective. DDO encourages mode-seeking distillation, prompting the student model to focus on high-probability teacher modes. This approach consistently outperforms existing few-step baselines and standard training methods under strict step budgets, significantly narrowing the performance gap with full-step decoding and laying a foundation for practical few-step DLLMs.
Key takeaway
For research scientists developing efficient text generation models, T3D offers a pathway to significantly improve few-step Diffusion LLM performance. You should consider integrating trajectory self-distillation and Direct Discriminative Optimization into your training pipelines to achieve faster inference without substantial quality degradation, especially when operating under tight computational budgets.
Key insights
Trajectory self-distillation with Direct Discriminative Optimization improves few-step diffusion language model efficiency.
Principles
- Distill generative trajectories for efficiency.
- Reverse-KL objective promotes mode-seeking distillation.
Method
The method involves trajectory self-distillation, where a model distills its own generative trajectories, combined with Direct Discriminative Optimization (DDO), a reverse-KL objective, to concentrate on high-probability teacher modes.
In practice
- Apply DDO for few-step DLLM training.
- Use trajectory distillation to reduce inference steps.
Topics
- Diffusion Language Models
- Trajectory Self-Distillation
- Direct Discriminative Optimization
- Few-Step Decoding
- Text Generation
Code references
Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.