Implementing Hybrid Semantic-Lexical Search in RAG
Summary
This article details the implementation of a hybrid search strategy for Retrieval-Augmented Generation (RAG) systems, combining BM25 lexical search with dense vector semantic search, and fusing their results using Reciprocal Rank Fusion (RRF). Published on May 25, 2026, the guide uses Python libraries like "rank_bm25" and "sentence-transformers" to demonstrate the process. It covers setting up independent lexical and semantic retrieval engines, generating embeddings with "all-MiniLM-L6-v2", and merging rankings using the RRF formula with a "k_constant" of 60. A small, nine-document dataset from a public GitHub repository is used to illustrate how this hybrid approach balances keyword-based and contextual understanding for improved retrieval accuracy.
Key takeaway
For MLOps Engineers scaling RAG solutions to production, relying solely on semantic search is insufficient. You should implement a hybrid search strategy, integrating lexical methods like BM25 with semantic search, and fuse results using Reciprocal Rank Fusion. This approach improves retrieval accuracy by covering diverse query types, enhancing the overall robustness and performance of your RAG system.
Key insights
Hybrid search, combining lexical and semantic methods via Reciprocal Rank Fusion, enhances RAG system retrieval accuracy.
Principles
- Lexical search covers semantic search's blind spots.
- Rank fusion is superior to raw score addition.
- RRF rewards high-ranking documents across lists.
Method
Implement BM25 and semantic search independently, then merge their full rankings using Reciprocal Rank Fusion (RRF) with the formula "RRF_score = 1 / (k + rank)".
In practice
- Use "rank_bm25" for lexical search.
- Use "sentence-transformers" for embeddings.
- Apply RRF with a "k_constant" of 60.
Topics
- Hybrid Search
- RAG Systems
- BM25
- Semantic Search
- Reciprocal Rank Fusion
- Information Retrieval
- Sentence Transformers
Code references
Best for: Machine Learning Engineer, AI Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MachineLearningMastery.com - Machinelearningmastery.com.