Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality
Summary
IBM Granite has released two new Apache 2.0 licensed multilingual embedding models, Granite Embedding Multilingual R2, designed for enterprise use. The models include a 97M-parameter compact version and a 311M-parameter full-size version. Both support over 200 languages, with enhanced retrieval quality for 52 languages and 9 programming languages, and feature an extended 32,768-token context window, a 64x increase over previous R1 models. The 97M model achieves a 60.3 score on MTEB Multilingual Retrieval, outperforming all other open sub-100M multilingual embedders, while the 311M model scores 65.2 and includes Matryoshka embedding support for flexible dimensionality. These models are built on the ModernBERT architecture and are compatible with popular frameworks like `sentence-transformers`, LangChain, LlamaIndex, Haystack, and Milvus.
Key takeaway
For AI Architects and NLP Engineers building multilingual RAG systems or cross-lingual search, the Granite Embedding Multilingual R2 models offer a compelling upgrade. Your teams can significantly improve retrieval quality and context handling for 200+ languages and code, especially for long documents, without complex integration. Consider the 97M model for efficiency or the 311M model for peak performance and Matryoshka flexibility.
Key insights
New IBM Granite multilingual embedding models offer superior retrieval quality and extended context for diverse language and code applications.
Principles
- Compact models can achieve high retrieval quality.
- Longer context windows significantly improve long-document retrieval.
- Matryoshka embeddings enable flexible dimension-quality trade-offs.
Method
Models are trained using knowledge distillation from multiple teachers, contrastive fine-tuning on multilingual retrieval pairs, and model merging, with vocabulary selection for compact versions.
In practice
- Use `granite-embedding-97m-multilingual-r2` for high throughput.
- Truncate 311M embeddings for reduced storage/computation.
- Integrate with LangChain or LlamaIndex via one-line model change.
Topics
- Multilingual Embeddings
- Granite Embedding R2
- ModernBERT Architecture
- Matryoshka Representation Learning
- MTEB Multilingual Retrieval
Code references
Best for: AI Architect, NLP Engineer, Research Scientist, Machine Learning Engineer, AI Scientist, AI Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.