Google's new open model DiffusionGemma generates text from noise instead of word by word
Summary
Google has released DiffusionGemma, an experimental language model that generates text using a diffusion-based method, processing blocks of 256 tokens simultaneously instead of word by word. This approach leverages graphics processors more efficiently, achieving speeds up to four times faster than traditional models in single-user mode on dedicated GPUs. While its generated text quality is lower than conventional models, DiffusionGemma, with its 26 billion parameters (3.8 billion active per step via Mixture-of-Experts), is particularly suited for non-linear tasks like inserting text or filling code gaps. The model, which builds on the Gemma 4 family and Gemini Diffusion, fits into 18 GB of VRAM when quantized and is available with open weights and broad tool support.
Key takeaway
For AI Engineers and researchers optimizing local inference or tackling non-linear text generation, you should evaluate DiffusionGemma. Its diffusion-based, parallel processing approach offers up to four times faster speeds on dedicated GPUs for single-user tasks, making it ideal for code completion, text insertion, or structured data manipulation, despite a trade-off in raw text quality. Consider fine-tuning for specific non-linear applications to maximize its unique strengths.
Key insights
DiffusionGemma generates text in parallel blocks from noise, significantly boosting local inference speed for non-linear tasks.
Principles
- Parallel token processing maximizes GPU compute utilization.
- Diffusion models excel at non-linear text generation.
- Mixture-of-Experts enables efficient parameter activation.
Method
The model starts with 256 random placeholder tokens and refines them across multiple passes until readable text emerges, inspired by image diffusion.
In practice
- Insert text into existing paragraphs efficiently.
- Fill gaps in program code or structured data.
- Solve constraint-based puzzles like Sudoku.
Topics
- DiffusionGemma
- Diffusion Models
- Text Generation
- GPU Acceleration
- Mixture-of-Experts
- Non-linear Text Tasks
- Open Weights
Code references
Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.