Google's new open model DiffusionGemma generates text from noise instead of word by word

2026-06-10 · Source: The Decoder · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

Google has released DiffusionGemma, an experimental language model that generates text using a diffusion-based method, processing blocks of 256 tokens simultaneously instead of word by word. This approach leverages graphics processors more efficiently, achieving speeds up to four times faster than traditional models in single-user mode on dedicated GPUs. While its generated text quality is lower than conventional models, DiffusionGemma, with its 26 billion parameters (3.8 billion active per step via Mixture-of-Experts), is particularly suited for non-linear tasks like inserting text or filling code gaps. The model, which builds on the Gemma 4 family and Gemini Diffusion, fits into 18 GB of VRAM when quantized and is available with open weights and broad tool support.

Key takeaway

For AI Engineers and researchers optimizing local inference or tackling non-linear text generation, you should evaluate DiffusionGemma. Its diffusion-based, parallel processing approach offers up to four times faster speeds on dedicated GPUs for single-user tasks, making it ideal for code completion, text insertion, or structured data manipulation, despite a trade-off in raw text quality. Consider fine-tuning for specific non-linear applications to maximize its unique strengths.

Key insights

DiffusionGemma generates text in parallel blocks from noise, significantly boosting local inference speed for non-linear tasks.

Principles

Parallel token processing maximizes GPU compute utilization.
Diffusion models excel at non-linear text generation.
Mixture-of-Experts enables efficient parameter activation.

Method

The model starts with 256 random placeholder tokens and refines them across multiple passes until readable text emerges, inspired by image diffusion.

In practice

Insert text into existing paragraphs efficiently.
Fill gaps in program code or structured data.
Solve constraint-based puzzles like Sudoku.

Topics

DiffusionGemma
Diffusion Models
Text Generation
GPU Acceleration
Mixture-of-Experts
Non-linear Text Tasks
Open Weights

Code references

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.