Build your own Hybrid Search
Summary
This content details the process of upgrading a BM25-based search engine to a scalable hybrid search system using Vespa, capable of handling over 1.3 million documents. The upgrade involves integrating semantic search capabilities with traditional keyword search. Key components introduced include Hierarchical Navigable Small World (HNSW) for high-speed embedding indexing and global phase ranking to combine keyword (BM25) and AI (embedding) scores. The schema is modified to include a `text_embedding` field of type `tensor<float>(x[384])`, utilizing a Sentence Transformer model (all-MiniLM-L6-v2) for embedding generation. New rank profiles, "semantic" and "fusion," are configured, with "fusion" employing reciprocal rank fusion (RRF) to combine BM25 and semantic similarity scores for improved relevance, especially for complex queries. Performance demonstrations show hybrid search yielding significantly more relevant results compared to BM25 alone, with search times in single-digit milliseconds.
Key takeaway
For AI Engineers building scalable search systems, integrating hybrid search with Vespa offers a significant leap in relevance. You should consider adopting HNSW for fast embedding indexing and reciprocal rank fusion in your ranking profiles to effectively combine lexical and semantic signals, especially when dealing with large document corpora and complex user queries. This approach ensures more accurate results without sacrificing performance.
Key insights
Hybrid search combining BM25 and semantic embeddings dramatically improves relevance for complex queries over large datasets.
Principles
- HNSW accelerates embedding search for millions of documents.
- Global phase ranking allows combining keyword and AI scores.
- Reciprocal Rank Fusion (RRF) merges ranks, not raw scores.
Method
Modify schema to add `text_embedding` field (e.g., `tensor<float>(x[384])` with HNSW and angular distance). Configure "fusion" rank profile using RRF to combine BM25 and semantic closeness scores in a global phase.
In practice
- Use Sentence Transformers for generating document embeddings.
- Set `rerank_count` to limit expensive global phase operations.
- Employ angular distance for normalized vector similarity.
Topics
- Hybrid Search
- Semantic Search
- Vespa
- HNSW
- Reciprocal Rank Fusion
Best for: Machine Learning Engineer, AI Engineer, Software Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Abhishek Thakur.