Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind

2026-04-20 · Source: AI Engineer · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, long

Summary

Google DeepMind recently released Gemma 4, a new family of open large language models ranging from 2 billion to 32 billion parameters. These models are designed for on-device deployment, with the smallest versions capable of running on Android, iOS, and Raspberry Pi, supporting multimodal understanding, reasoning, and agentic tasks offline. Gemma 4 includes a Mixture of Experts (MoE) model for low-latency applications and a 31B parameter model for maximum intelligence, all fitting within consumer GPUs. A key architectural innovation, E2B (effectively 2 billion parameters), uses per-layer embeddings to optimize smaller models for mobile, allowing parts to run on CPU or disk. Gemma 4 also features a new Apache 2 license, enhanced multilingual capabilities trained on over 140 languages using a Gemini-based tokenizer, and has seen rapid community adoption with over 10 million downloads and 1,000 community-tuned models in its first week.

Key takeaway

For AI Architects and MLOps Engineers evaluating on-device AI solutions, Gemma 4 presents a compelling option due to its Apache 2 license, diverse parameter sizes (2B-32B), and proven on-device performance across various platforms. Your teams should explore Gemma 4 for offline agentic applications, multimodal understanding, and multilingual tasks, especially where data privacy or low-latency inference is critical, leveraging its optimized architecture for consumer-grade hardware.

Key insights

Gemma 4 offers highly capable, open, and on-device LLMs with an Apache 2 license and multimodal multilingual support.

Principles

Prioritize on-device capability for open models.
Optimize architecture for efficient mobile deployment.
Foster ecosystem through open licensing and tooling.

Method

Gemma E2B architecture uses per-layer embeddings as a lookup table, offloading non-GPU computations to CPU or disk, enabling efficient on-device execution for smaller models.

In practice

Run Gemma 4 on Android, iOS, or Raspberry Pi.
Fine-tune Gemma for low-resource languages.
Utilize Shield Gemma for content moderation.

Topics

Gemma 4
Open Models
On-Device AI
Multimodal Capabilities
Multilingual Support

Best for: CTO, AI Architect, MLOps Engineer, Machine Learning Engineer, AI Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.