[AINews] Gemma 4: The best small Multimodal Open Models, dramatically better than Gemma 3 in every way
Summary
Google DeepMind has released Gemma 4, a family of open-weight models under the commercially permissive Apache 2.0 license, marking a significant advancement in open-source AI. The 31B dense variant ties with Kimi K2.5 (744B-A40B) and Z.ai GLM-5 (1T-A32B) as a top open model, but with substantially fewer parameters. Gemma 4 models are designed for reasoning and agentic workflows, supporting local/edge deployment, and feature native multimodal capabilities for video, images, and audio, including OCR, chart understanding, and speech recognition. Key architectural choices include hybrid attention mechanisms, proportional RoPE, Per-Layer Embeddings (PLE), KV-cache sharing, and long context up to 256K. The release saw immediate ecosystem support across platforms like llama.cpp, Ollama, and vLLM, with impressive local inference performance demonstrations.
Key takeaway
For CTOs and VP of Engineering evaluating open-source AI models for multimodal applications or edge deployment, Gemma 4 represents a compelling option. Its Apache 2.0 license, efficient architecture, and native support for video, image, and audio processing, combined with strong benchmark performance, make it suitable for developing reasoning and agentic workflows. You should explore its integration into your existing infrastructure, especially given its immediate support across popular local inference frameworks, to capitalize on its capabilities for on-device or resource-constrained environments.
Key insights
Gemma 4 advances open-source AI with multimodal capabilities, efficient architecture, and a permissive Apache 2.0 license.
Principles
- Smaller models can achieve top-tier performance.
- Multimodal input is crucial for edge AI.
- Open-source licensing drives rapid adoption.
Method
Gemma 4 employs a hybrid 5:1 local/global attention mechanism, partial-dimension RoPE, and per-layer embeddings, alongside a refined training recipe and data, to achieve high performance with fewer parameters.
In practice
- Deploy Gemma 4 for local/edge multimodal AI tasks.
- Utilize its function calling for agentic workflows.
- Integrate with existing stacks via day-0 support.
Topics
- Gemma 4
- Multimodal AI Models
- Open-weight Licensing
- AI Agent Workflows
- Edge AI Deployment
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.