Continuous Diffusion Scales Competitively with Discrete Diffusion for Language
Summary
RePlaid, a continuous diffusion language model (DLM), demonstrates competitive scalability against discrete diffusion approaches, challenging prior beliefs. By aligning Plaid's architecture with modern discrete DLMs, RePlaid establishes the first scaling law for continuous DLMs that rivals discrete counterparts. It exhibits a compute gap of only 20x compared to autoregressive models, surpasses Duo with fewer parameters, and outperforms MDLM in the over-trained regime. Benchmarked on OpenWebText, RePlaid achieves a new PPL bound of 22.1 among continuous DLMs and superior generation quality. These findings suggest that likelihood-trained continuous diffusion is a highly competitive and scalable alternative to discrete DLMs.
Key takeaway
For research scientists exploring next-generation language models, RePlaid's performance indicates that continuous diffusion, particularly with likelihood-based training, is a viable and scalable alternative to discrete methods. You should investigate continuous DLMs for their potential to achieve competitive PPL bounds and generation quality, especially when optimizing noise schedules and embeddings.
Key insights
Continuous diffusion language models, when likelihood-trained, can scale competitively with discrete diffusion models.
Principles
- Likelihood-based training improves continuous DLM scalability.
- Optimizing noise schedules yields linear cross-entropy.
- Embedding optimization drives significant likelihood gains.
Method
RePlaid aligns a continuous diffusion architecture (Plaid) with modern discrete DLMs to establish competitive scaling laws and performance benchmarks.
In practice
- Consider continuous DLMs for language generation.
- Focus on likelihood-based training for scalability.
- Optimize embeddings for better likelihood.
Topics
- Continuous Diffusion
- Diffusion Language Models
- Scaling Laws
- Likelihood-based Training
- RePlaid
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.