Larger Context Windows Don’t Fix RAG — So I Built a System That Does
Summary
A new dataset Q&A system, built for messy CSV files, revealed a critical flaw in Retrieval-Augmented Generation (RAG) pipelines when handling computation queries. Initial testing showed RAG confidently returned less than half the correct total spend of \$1,140,033.24 from a 100,000-row dataset. Increasing context windows from 4k to 128k tokens, and up to 8,000 rows, exacerbated "Error Observability Collapse," making significant errors (over 50% wrong) harder to detect as responses became longer and more authoritative. The author developed a benchmark comparing RAG simulation against a Semantic Engine, which performs deterministic full-scans in under 200ms. The proposed solution is a QueryRouter that classifies queries into "COMPUTATION" (for the Semantic Engine) or "RETRIEVAL" (for RAG), achieving 9/9 routing accuracy and ensuring exact answers for aggregations.
Key takeaway
For AI Engineers building data Q&A systems, recognize that RAG is fundamentally unsuited for aggregation or computation on structured data. You should implement an intent-based QueryRouter to direct analytical queries, identified by aggregation verbs or numeric comparisons, to a dedicated, deterministic computation engine. This prevents "Error Observability Collapse" where RAG provides confidently wrong answers, ensuring your system delivers accurate results for critical data analysis tasks while RAG handles appropriate retrieval.
Key insights
RAG systems fail at data aggregation, producing confident but incorrect answers, a problem exacerbated by larger context windows.
Principles
- RAG is for retrieval, not computation.
- Larger context windows increase confidence, not accuracy.
- Deterministic computation prevents silent errors.
Method
Implement a QueryRouter to classify queries based on aggregation verbs, numeric comparisons, or retrieval signals. Route computation queries to a deterministic Semantic Engine for full-scan processing, and lookup queries to RAG. Default to computation for ambiguous queries.
In practice
- Route "total," "average," "percentage" queries to a computation engine.
- Use a regex-based classifier for low latency query routing.
- Benchmark RAG for numerical accuracy on your datasets.
Topics
- Retrieval-Augmented Generation
- Query Routing
- Data Aggregation
- Semantic Engine
- Context Windows
- Error Observability
Code references
Best for: AI Engineer, Machine Learning Engineer, Data Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.