One-step Language Modeling via Continuous Denoising

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, long

Summary

Researchers introduce a novel flow-based language model (FLM) and its distilled version, FMLM, which challenge the notion that discrete diffusion models are superior for generative modeling over discrete modalities. FLM performs Euclidean denoising on one-hot token encodings, trained by predicting clean data via a cross-entropy objective with a time reparameterization for stability. FMLM, derived from FLM through distillation, enables few-step generation. The models were empirically validated on the LM1B and OWT language datasets. FLM achieved generation quality comparable to state-of-the-art discrete diffusion models, while FMLM surpassed recent few-step language models, matching their 8-step quality in just one step, demonstrating an approximate 8.3x speedup on LM1B.

Key takeaway

For research scientists developing faster, high-quality language models, this work indicates that continuous flow-based methods offer a viable and superior alternative to discrete diffusion. You should explore implementing FLM and FMLM architectures, particularly focusing on the proposed time reparameterization and flow map distillation techniques, to achieve significant speedups like the 8.3x observed on LM1B without sacrificing generation quality.

Key insights

Continuous denoising via flow-based models can surpass discrete diffusion in language model quality and speed.

Principles

Method

FLM trains a denoiser predicting clean data using a cross-entropy objective and time reparameterization. FMLM is then distilled from FLM's associated flow map for few-step generation.

In practice

Topics

Code references

Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.