SCAR: Semantic Continuity-Aware Retrieval for Efficient Context Expansion in RAG

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing, Information Retrieval · Depth: Expert, quick

Summary

SCAR (Semantic Continuity-Aware Retrieval) is an adaptive retrieval policy designed to mitigate boundary fragmentation in Retrieval-Augmented Generation (RAG) systems, a common issue with fixed-length chunking that degrades recall. SCAR selectively expands neighboring chunks by balancing query-neighbor relevance with a structural continuity penalty, employing a relative expansion threshold for scale-invariant decisions across embedding models without recalibration. Evaluated across four diverse corpora, SCAR achieved 92.8% recall on boundary-fragmented queries using only 7.84 chunks, representing a 22.9% reduction compared to static windowing's 10.16 chunks. This chunk reduction is highly significant (p<0.0001). The policy successfully transfers across text-embedding-3-large, BGE-large-en-v1.5, and zembed-1 models. Furthermore, RAGAS evaluation on the 10-K corpus confirmed SCAR preserves generation faithfulness while reducing context tokens by 27.1%.

Key takeaway

For RAG developers optimizing context window efficiency and retrieval accuracy, SCAR presents a compelling adaptive retrieval strategy. By dynamically expanding context based on semantic continuity and relevance, you can achieve 92.8% recall on fragmented queries while significantly reducing context tokens by 27.1% and overall chunk count by 22.9%. This approach offers a robust way to enhance RAG system performance and reduce operational costs, especially given its transferability across various embedding models.

Key insights

SCAR adaptively expands RAG context by balancing semantic relevance and structural continuity to improve recall and efficiency.

Principles

Method

SCAR selectively expands neighboring chunks by weighing query-neighbor relevance against a structural continuity penalty, using a relative expansion threshold tied to the retrieved chunk's query-relevance.

In practice

Topics

Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.