Semantic Chunking vs Fixed Chunking: Why Your RAG’s Retrieval Quality Starts Before the Query
Summary
This article, Part 2 of a 5-part series on production-grade RAG systems, details a two-stage chunking strategy to improve retrieval quality. It introduces a fixed character-based chunker for creating large parent documents, configured with `chunk_size=2000` and `chunk_overlap=400`. These parent chunks then feed into a semantic sliding window chunker, which uses `window_size=200` and `overlap=40` to generate smaller, overlapping word-level windows. The core innovation is semantic merging, where adjacent windows are concatenated if their cosine similarity, calculated from 768-dimensional `nomic-embed-text` embeddings via Ollama, exceeds a `threshold` of `0.60`. This process, while doubling embedding calls, is deemed "essentially free" due to local Ollama execution, and it significantly enhances chunk coherence compared to arbitrary fixed boundaries.
Key takeaway
For AI Engineers building RAG systems, prioritizing semantic chunking over simple fixed-size splits is critical for retrieval quality. Implement a two-stage approach: use fixed chunking for large parent documents and then apply semantic sliding window chunking with similarity-based merging for the smaller, indexed child chunks. This strategy, despite requiring double embedding passes, yields significantly more coherent retrieval units and is cost-effective with local embedding models like Ollama, directly impacting the relevance of your search results.
Key insights
Effective RAG retrieval hinges on semantic chunking, not just fixed-size splits, to ensure coherent context.
Principles
- Chunking quality precedes retrieval effectiveness.
- Semantic boundaries improve context preservation.
- Local embedding reduces cost of multi-stage chunking.
Method
A two-stage chunking process: first, fixed character chunking for large parent documents, followed by semantic sliding window chunking with cosine similarity-based merging for child retrieval units.
In practice
- Use `chunk_size=2000`, `overlap=400` for parent chunks.
- Apply `window_size=200`, `overlap=40` for child windows.
- Set semantic merge `threshold` to `0.60` for aggressive merging.
Topics
- RAG Systems
- Semantic Chunking
- Fixed Chunking
- Parent-Child Architecture
- Document Chunking
Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.