One-step Language Modeling via Continuous Denoising

2026-01-25 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, long

Summary

Researchers introduce a novel flow-based language model (FLM) and its distilled version, FMLM, which challenge the notion that discrete diffusion models are superior for generative modeling over discrete modalities. FLM performs Euclidean denoising on one-hot token encodings, trained by predicting clean data via a cross-entropy objective with a time reparameterization for stability. FMLM, derived from FLM through distillation, enables few-step generation. The models were empirically validated on the LM1B and OWT language datasets. FLM achieved generation quality comparable to state-of-the-art discrete diffusion models, while FMLM surpassed recent few-step language models, matching their 8-step quality in just one step, demonstrating an approximate 8.3x speedup on LM1B.

Key takeaway

For research scientists developing faster, high-quality language models, this work indicates that continuous flow-based methods offer a viable and superior alternative to discrete diffusion. You should explore implementing FLM and FMLM architectures, particularly focusing on the proposed time reparameterization and flow map distillation techniques, to achieve significant speedups like the 8.3x observed on LM1B without sacrificing generation quality.

Key insights

Continuous denoising via flow-based models can surpass discrete diffusion in language model quality and speed.

Principles

Continuous flows avoid discrete diffusion's factorization error.
Time reparameterization improves training stability and quality.
Flow map distillation enables efficient few-step generation.

Method

FLM trains a denoiser predicting clean data using a cross-entropy objective and time reparameterization. FMLM is then distilled from FLM's associated flow map for few-step generation.

In practice

Implement Euclidean denoising for one-hot token encodings.
Apply time reparameterization for stable training.
Distill flow maps for accelerated one-step generation.

Topics

Flow-based Language Models
Continuous Denoising
Few-step Generation
Discrete Diffusion
Language Modeling

Code references

david3684/flm

Best for: Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.