Semantic Chunking vs Fixed Chunking: Why Your RAG’s Retrieval Quality Starts Before the Query

2026-04-20 · Source: Towards AI - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Intermediate, short

Summary

This article, Part 2 of a 5-part series on production-grade RAG systems, details a two-stage chunking strategy to improve retrieval quality. It introduces a fixed character-based chunker for creating large parent documents, configured with `chunk_size=2000` and `chunk_overlap=400`. These parent chunks then feed into a semantic sliding window chunker, which uses `window_size=200` and `overlap=40` to generate smaller, overlapping word-level windows. The core innovation is semantic merging, where adjacent windows are concatenated if their cosine similarity, calculated from 768-dimensional `nomic-embed-text` embeddings via Ollama, exceeds a `threshold` of `0.60`. This process, while doubling embedding calls, is deemed "essentially free" due to local Ollama execution, and it significantly enhances chunk coherence compared to arbitrary fixed boundaries.

Key takeaway

For AI Engineers building RAG systems, prioritizing semantic chunking over simple fixed-size splits is critical for retrieval quality. Implement a two-stage approach: use fixed chunking for large parent documents and then apply semantic sliding window chunking with similarity-based merging for the smaller, indexed child chunks. This strategy, despite requiring double embedding passes, yields significantly more coherent retrieval units and is cost-effective with local embedding models like Ollama, directly impacting the relevance of your search results.

Key insights

Effective RAG retrieval hinges on semantic chunking, not just fixed-size splits, to ensure coherent context.

Principles

Chunking quality precedes retrieval effectiveness.
Semantic boundaries improve context preservation.
Local embedding reduces cost of multi-stage chunking.

Method

A two-stage chunking process: first, fixed character chunking for large parent documents, followed by semantic sliding window chunking with cosine similarity-based merging for child retrieval units.

In practice

Use `chunk_size=2000`, `overlap=400` for parent chunks.
Apply `window_size=200`, `overlap=40` for child windows.
Set semantic merge `threshold` to `0.60` for aggressive merging.

Topics

RAG Systems
Semantic Chunking
Fixed Chunking
Parent-Child Architecture
Document Chunking

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.