[AINews] Gemma 4: The best small Multimodal Open Models, dramatically better than Gemma 3 in every way

2026-04-03 · Source: Latent.Space - Www.latent.space · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

Google DeepMind has released Gemma 4, a family of open-weight models under the commercially permissive Apache 2.0 license, marking a significant advancement in open-source AI. The 31B dense variant ties with Kimi K2.5 (744B-A40B) and Z.ai GLM-5 (1T-A32B) as a top open model, but with substantially fewer parameters. Gemma 4 models are designed for reasoning and agentic workflows, supporting local/edge deployment, and feature native multimodal capabilities for video, images, and audio, including OCR, chart understanding, and speech recognition. Key architectural choices include hybrid attention mechanisms, proportional RoPE, Per-Layer Embeddings (PLE), KV-cache sharing, and long context up to 256K. The release saw immediate ecosystem support across platforms like llama.cpp, Ollama, and vLLM, with impressive local inference performance demonstrations.

Key takeaway

For CTOs and VP of Engineering evaluating open-source AI models for multimodal applications or edge deployment, Gemma 4 represents a compelling option. Its Apache 2.0 license, efficient architecture, and native support for video, image, and audio processing, combined with strong benchmark performance, make it suitable for developing reasoning and agentic workflows. You should explore its integration into your existing infrastructure, especially given its immediate support across popular local inference frameworks, to capitalize on its capabilities for on-device or resource-constrained environments.

Key insights

Gemma 4 advances open-source AI with multimodal capabilities, efficient architecture, and a permissive Apache 2.0 license.

Principles

Smaller models can achieve top-tier performance.
Multimodal input is crucial for edge AI.
Open-source licensing drives rapid adoption.

Method

Gemma 4 employs a hybrid 5:1 local/global attention mechanism, partial-dimension RoPE, and per-layer embeddings, alongside a refined training recipe and data, to achieve high performance with fewer parameters.

In practice

Deploy Gemma 4 for local/edge multimodal AI tasks.
Utilize its function calling for agentic workflows.
Integrate with existing stacks via day-0 support.

Topics

Gemma 4
Multimodal AI Models
Open-weight Licensing
AI Agent Workflows
Edge AI Deployment

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.