Google announces Gemma 4 open AI models, switches to Apache 2.0 license

2026-04-02 · Source: AI - Ars Technica · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, medium

Summary

Google has released Gemma 4, a significant update to its open-weight AI models, featuring four new sizes optimized for local usage and a switch to the more permissive Apache 2.0 license. The larger variants, 26B Mixture of Experts and 31B Dense, are designed for high-performance local machines, with the 26B MoE model activating only 3.8 billion parameters for faster inference. Two smaller models, Effective 2B (E2B) and Effective 4B (E4B), are optimized for mobile and IoT devices, offering low memory usage and near-zero latency. Gemma 4 models, built on the same technology as Gemini 3, boast improved reasoning, math, instruction-following, native function calling, structured JSON output, and enhanced code generation capabilities. They also support over 140 languages and offer context windows up to 256k tokens for larger models and 128k for edge models. This release also confirms the upcoming Gemini Nano 4, derived from Gemma 4 E2B and E4B.

Key takeaway

For CTOs and VP of Engineering evaluating local AI deployment strategies, Gemma 4's shift to an Apache 2.0 license significantly reduces legal friction and expands use cases, particularly for agentic workflows and offline code generation. Your teams should explore prototyping with Gemma 4 E2B and E4B for forward compatibility with Gemini Nano 4, ensuring robust, privacy-preserving AI capabilities on edge devices without cloud dependency.

Key insights

Gemma 4 offers optimized open-weight AI models for local and mobile use, enhancing performance and developer freedom with an Apache 2.0 license.

Principles

Local AI processing enhances data control.
Open licenses foster broader adoption.
Parameter activation impacts inference speed.

Method

Gemma 4 employs a Mixture of Experts (MoE) architecture for speed and a Dense model for quality, alongside memory-efficient designs for mobile, supporting agentic workflows and native function calling.

In practice

Run 26B MoE for fast local inference.
Fine-tune 31B Dense for specific tasks.
Utilize E2B/E4B for mobile AI applications.

Topics

Gemma 4
Apache 2.0 License
Local AI Models
Mobile AI
Agentic Workflows

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.