Gemma, DeepMind's Family of Open Models — Omar Sanseviero, Google DeepMind
Summary
Google DeepMind recently released Gemma 4, a new family of open large language models ranging from 2 billion to 32 billion parameters. These models are designed for on-device deployment, with the smallest versions capable of running on Android, iOS, and Raspberry Pi, supporting multimodal understanding, reasoning, and agentic tasks offline. Gemma 4 includes a Mixture of Experts (MoE) model for low-latency applications and a 31B parameter model for maximum intelligence, all fitting within consumer GPUs. A key architectural innovation, E2B (effectively 2 billion parameters), uses per-layer embeddings to optimize smaller models for mobile, allowing parts to run on CPU or disk. Gemma 4 also features a new Apache 2 license, enhanced multilingual capabilities trained on over 140 languages using a Gemini-based tokenizer, and has seen rapid community adoption with over 10 million downloads and 1,000 community-tuned models in its first week.
Key takeaway
For AI Architects and MLOps Engineers evaluating on-device AI solutions, Gemma 4 presents a compelling option due to its Apache 2 license, diverse parameter sizes (2B-32B), and proven on-device performance across various platforms. Your teams should explore Gemma 4 for offline agentic applications, multimodal understanding, and multilingual tasks, especially where data privacy or low-latency inference is critical, leveraging its optimized architecture for consumer-grade hardware.
Key insights
Gemma 4 offers highly capable, open, and on-device LLMs with an Apache 2 license and multimodal multilingual support.
Principles
- Prioritize on-device capability for open models.
- Optimize architecture for efficient mobile deployment.
- Foster ecosystem through open licensing and tooling.
Method
Gemma E2B architecture uses per-layer embeddings as a lookup table, offloading non-GPU computations to CPU or disk, enabling efficient on-device execution for smaller models.
In practice
- Run Gemma 4 on Android, iOS, or Raspberry Pi.
- Fine-tune Gemma for low-resource languages.
- Utilize Shield Gemma for content moderation.
Topics
- Gemma 4
- Open Models
- On-Device AI
- Multimodal Capabilities
- Multilingual Support
Best for: CTO, AI Architect, MLOps Engineer, Machine Learning Engineer, AI Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineer.