[AINews] Gemma 4: The best small Multimodal Open Models, dramatically better than Gemma 3 in every way

· Source: Latent.Space - Www.latent.space · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, medium

Summary

Google DeepMind has released Gemma 4, a family of open-weight models under the commercially permissive Apache 2.0 license, marking a significant advancement in open-source AI. The 31B dense variant ties with Kimi K2.5 (744B-A40B) and Z.ai GLM-5 (1T-A32B) as a top open model, but with substantially fewer parameters. Gemma 4 models are designed for reasoning and agentic workflows, supporting local/edge deployment, and feature native multimodal capabilities for video, images, and audio, including OCR, chart understanding, and speech recognition. Key architectural choices include hybrid attention mechanisms, proportional RoPE, Per-Layer Embeddings (PLE), KV-cache sharing, and long context up to 256K. The release saw immediate ecosystem support across platforms like llama.cpp, Ollama, and vLLM, with impressive local inference performance demonstrations.

Key takeaway

For CTOs and VP of Engineering evaluating open-source AI models for multimodal applications or edge deployment, Gemma 4 represents a compelling option. Its Apache 2.0 license, efficient architecture, and native support for video, image, and audio processing, combined with strong benchmark performance, make it suitable for developing reasoning and agentic workflows. You should explore its integration into your existing infrastructure, especially given its immediate support across popular local inference frameworks, to capitalize on its capabilities for on-device or resource-constrained environments.

Key insights

Gemma 4 advances open-source AI with multimodal capabilities, efficient architecture, and a permissive Apache 2.0 license.

Principles

Method

Gemma 4 employs a hybrid 5:1 local/global attention mechanism, partial-dimension RoPE, and per-layer embeddings, alongside a refined training recipe and data, to achieve high performance with fewer parameters.

In practice

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.