Granite Embedding Multilingual R2: Open Apache 2.0 Multilingual Embeddings with 32K Context — Best Sub-100M Retrieval Quality

2026-05-04 · Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Advanced, long

Summary

IBM Granite has released two new Apache 2.0 licensed multilingual embedding models, Granite Embedding Multilingual R2, designed for enterprise use. The models include a 97M-parameter compact version and a 311M-parameter full-size version. Both support over 200 languages, with enhanced retrieval quality for 52 languages and 9 programming languages, and feature an extended 32,768-token context window, a 64x increase over previous R1 models. The 97M model achieves a 60.3 score on MTEB Multilingual Retrieval, outperforming all other open sub-100M multilingual embedders, while the 311M model scores 65.2 and includes Matryoshka embedding support for flexible dimensionality. These models are built on the ModernBERT architecture and are compatible with popular frameworks like `sentence-transformers`, LangChain, LlamaIndex, Haystack, and Milvus.

Key takeaway

For AI Architects and NLP Engineers building multilingual RAG systems or cross-lingual search, the Granite Embedding Multilingual R2 models offer a compelling upgrade. Your teams can significantly improve retrieval quality and context handling for 200+ languages and code, especially for long documents, without complex integration. Consider the 97M model for efficiency or the 311M model for peak performance and Matryoshka flexibility.

Key insights

New IBM Granite multilingual embedding models offer superior retrieval quality and extended context for diverse language and code applications.

Principles

Compact models can achieve high retrieval quality.
Longer context windows significantly improve long-document retrieval.
Matryoshka embeddings enable flexible dimension-quality trade-offs.

Method

Models are trained using knowledge distillation from multiple teachers, contrastive fine-tuning on multilingual retrieval pairs, and model merging, with vocabulary selection for compact versions.

In practice

Use `granite-embedding-97m-multilingual-r2` for high throughput.
Truncate 311M embeddings for reduced storage/computation.
Integrate with LangChain or LlamaIndex via one-line model change.

Topics

Multilingual Embeddings
Granite Embedding R2
ModernBERT Architecture
Matryoshka Representation Learning
MTEB Multilingual Retrieval

Code references

Best for: AI Architect, NLP Engineer, Research Scientist, Machine Learning Engineer, AI Scientist, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.