Google open-sources speedy DiffusionGemma text diffusion model

2026-06-10 · Source: AI – SiliconANGLE · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, quick

Summary

Google LLC has open-sourced DiffusionGemma, a new large language model released on June 10, 2026, that utilizes a text diffusion approach. This model generates text four times faster than conventional LLMs and consumes less RAM, enabling its use on high-end consumer graphics cards. DiffusionGemma achieves over 1,000 tokens/second on an Nvidia H100 GPU and 700+ tokens/second on a GeForce RTX 5090. Its speed comes from a parallelization method that produces 256 tokens simultaneously. The model employs a mixture-of-experts architecture, activating only 3.8 billion of its 26 billion parameters, and uses the lightweight NVFP4 data format to further reduce memory usage. It is based on Gemma 4 26B A4B, with a modified attention mechanism for improved prompt interpretation.

Key takeaway

For AI engineers deploying LLMs, DiffusionGemma changes your inference strategy by offering significantly faster text generation on more accessible hardware. You can achieve over 700 tokens/second on a GeForce RTX 5090, reducing the need for expensive server-grade GPUs. Consider integrating this open-source model from Hugging Face to optimize performance and lower operational costs for your text generation applications.

Key insights

DiffusionGemma offers significantly faster, memory-efficient text generation by applying text diffusion and parallel processing to large language models.

Principles

Text diffusion enables parallel token generation.
Mixture-of-experts reduces active parameter count.
Lightweight data formats lower RAM consumption.

Method

DiffusionGemma generates random text, iteratively replaces subsets with relevant words, and reviews edits until the prompt response is complete.

In practice

Run LLMs on consumer-grade GPUs.
Achieve 700+ tokens/second on RTX 5090.
Utilize open-source DiffusionGemma on Hugging Face.

Topics

DiffusionGemma
Text Diffusion
Large Language Models
GPU Inference
Mixture-of-Experts
Open-Source AI

Best for: NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI – SiliconANGLE.