Gemma 4

2026-04-02 · Source: AINews · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Software Development & Engineering · Depth: Advanced, extended

Summary

Google DeepMind has released Gemma 4, a new family of open-weight, multimodal models under the Apache 2.0 license, explicitly designed for reasoning, agentic workflows, and local/edge deployment. This release includes four model sizes: 31B dense, 26B MoE (~4B active parameters), and two edge-optimized models (E4B, E2B) with native multimodal support for text, vision, and audio. Key features include function calling, structured JSON output, and long context up to 256K tokens. Early benchmarks position Gemma-4-31B as a top-tier open model, with notable performance in scientific reasoning (GPQA Diamond 85.7%). The release saw immediate ecosystem support across major local and serving stacks like llama.cpp, Ollama, and vLLM, with impressive local inference performance anecdotes, including 300 t/s on an M2 Ultra. Architectural notes highlight hybrid attention, MoE blocks as separate layers, and efficiency tricks, though some suggest the leap is more in training data than architecture.

Key takeaway

For CTOs and VPs of Engineering evaluating open-source AI models for agentic workflows or edge deployment, Gemma 4's Apache 2.0 license, multimodal capabilities, and strong benchmark performance make it a compelling option. Its rapid ecosystem integration and optimized local inference suggest a lower barrier to adoption and faster time-to-market for new applications. You should consider prototyping with Gemma 4 for projects requiring robust reasoning and on-device execution, especially given its competitive performance against larger models.

Key insights

Gemma 4 offers powerful, open-weight multimodal AI with strong local deployment and agentic capabilities under an Apache 2.0 license.

Principles

Open-weight models drive rapid ecosystem integration.
Hybrid architectures balance performance and efficiency.
Training data quality significantly impacts model capability.

Method

Gemma 4 utilizes a hybrid attention mechanism, MoE blocks as separate layers, and techniques like Proportional RoPE for memory optimization, enabling efficient multimodal processing and long-context handling.

In practice

Deploy Gemma 4 locally using llama.cpp or Ollama for edge applications.
Utilize Gemma 4's function calling for structured agentic workflows.
Explore MoE variants for large-model quality at reduced inference cost.

Topics

Gemma 4
Open-weight AI Models
Multimodal AI
AI Agents
Model Architecture

Code references

huggingface/transformers

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Machine Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.