IBM Granite 3.0 models
Summary
IBM has released a selection of its Granite 3.0 models, now available for deployment via Ollama under an Apache 2.0 license as of October 21, 2024. The Granite 3.0 series includes both dense and Mixture of Expert (MoE) architectures. The text-only dense LLMs, Granite 2B and Granite 8B, were trained on over 12 trillion tokens and show performance comparable to Llama 3.1 8B Instruct on OpenLLM Leaderboard v1 and v2 benchmarks. These dense models are optimized for tool-based use cases, RAG, code generation, translation, and bug fixing. Additionally, IBM introduced Granite 1B MoE and Granite 3B MoE, trained on over 10 trillion tokens, specifically designed for low-latency, on-device, and instantaneous inference applications.
Key takeaway
For MLOps Engineers evaluating new open-source LLMs for deployment, consider the IBM Granite 3.0 models available through Ollama. The Granite 8B Instruct model offers performance on par with Llama 3.1 8B Instruct for general tasks, while the MoE variants (1B and 3B) are specifically engineered for low-latency, on-device inference, making them suitable for edge computing or real-time applications where speed is critical.
Key insights
IBM's Granite 3.0 models, including dense and MoE variants, are now available via Ollama under Apache 2.0.
Principles
- MoE models excel in low-latency inference scenarios.
- Dense LLMs support tool-based use cases and RAG.
In practice
- Use `ollama run granite3-dense:8b` for dense 8B model.
- Deploy Granite MoE for on-device applications.
Topics
- IBM Granite 3.0
- Large Language Models
- Mixture-of-Experts
- Retrieval-Augmented Generation
- Ollama Integration
Best for: AI Architect, MLOps Engineer, NLP Engineer, AI Engineer, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Ollama Blog.