SOTA Embedding Model for Agentic Workflows Now in Public Preview

· Source: Databricks · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Data Science & Analytics · Depth: Intermediate, quick

Summary

Qwen3-Embedding-0.6B is a state-of-the-art 0.6B-class embedding model hosted on Databricks, outperforming larger models like OpenAI and Cohere's flagship offerings on MTEB leaderboards, providing top-tier retrieval performance with reduced latency and cost. It features Matryoshka Representation Learning (MRL), allowing fine-grained control over cost and recall by enabling safe truncation of embeddings to any size from 32 to 1024 dimensions at request time. As the first multilingual embedding model on Databricks, it supports over 100 languages, enabling strong performance for multilingual and cross-lingual tasks, including searching in one language and retrieving results in another. The model is deployed securely on fully managed serverless GPUs via Databricks Foundation Model APIs, handling provisioning and autoscaling while respecting data residency requirements. Available as "databricks-qwen3-embedding-0-6b" across all clouds and regions supporting Foundation Model Serving, it is ideal for semantic search, RAG pipelines, and text classification, accessible via Pay-Per-Token, AI Functions, and Provisioned Throughput.

Key takeaway

Qwen3-Embedding-0.6B, a new multilingual model on Databricks, achieves state-of-the-art quality for its size, outperforming most 0.6B models and rivaling 7B+ models on MTEB benchmarks. Its Matryoshka Representation Learning (MRL) enables dynamic embedding truncation from 32 to 1024 dimensions, optimizing cost and recall. This provides top-tier, low-latency multilingual retrieval for RAG and semantic search, with secure serverless deployment.

Topics

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Databricks.