SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG
Summary
SCAR (Semantic Continuity-Aware Retrieval) is an adaptive retrieval policy designed to mitigate boundary fragmentation in Retrieval-Augmented Generation (RAG) systems, a common issue with fixed-length chunking that degrades recall. SCAR selectively expands neighboring chunks by balancing query-neighbor relevance with a structural continuity penalty, employing a relative expansion threshold for scale-invariant decisions across embedding models without recalibration. Evaluated across four diverse corpora, SCAR achieved 92.8% recall on boundary-fragmented queries using only 7.84 chunks, representing a 22.9% reduction compared to static windowing's 10.16 chunks. This chunk reduction is highly significant (p<0.0001). The policy successfully transfers across text-embedding-3-large, BGE-large-en-v1.5, and zembed-1 models. Furthermore, RAGAS evaluation on the 10-K corpus confirmed SCAR preserves generation faithfulness while reducing context tokens by 27.1%.
Key takeaway
For RAG developers optimizing context window efficiency and retrieval accuracy, SCAR presents a compelling adaptive retrieval strategy. By dynamically expanding context based on semantic continuity and relevance, you can achieve 92.8% recall on fragmented queries while significantly reducing context tokens by 27.1% and overall chunk count by 22.9%. This approach offers a robust way to enhance RAG system performance and reduce operational costs, especially given its transferability across various embedding models.
Key insights
SCAR adaptively expands RAG context by balancing semantic relevance and structural continuity to improve recall and efficiency.
Principles
- Adaptive context expansion reduces token overhead while maintaining recall.
- Balancing query-neighbor relevance with structural continuity improves retrieval.
- Scale-invariant decision rules enhance transferability across embedding models.
Method
SCAR selectively expands neighboring chunks by weighing query-neighbor relevance against a structural continuity penalty, using a relative expansion threshold tied to the retrieved chunk's query-relevance.
In practice
- Implement adaptive chunk expansion in RAG pipelines.
- Prioritize semantic continuity in context retrieval.
- Reduce RAG inference costs by optimizing context length.
Topics
- Retrieval-Augmented Generation
- Context Expansion
- Semantic Continuity
- Information Retrieval
- Embedding Models
- Chunking Strategies
Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.