Larger Context Windows Don’t Fix RAG — So I Built a System That Does

2026-06-13 · Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

A new dataset Q&A system, built for messy CSV files, revealed a critical flaw in Retrieval-Augmented Generation (RAG) pipelines when handling computation queries. Initial testing showed RAG confidently returned less than half the correct total spend of \$1,140,033.24 from a 100,000-row dataset. Increasing context windows from 4k to 128k tokens, and up to 8,000 rows, exacerbated "Error Observability Collapse," making significant errors (over 50% wrong) harder to detect as responses became longer and more authoritative. The author developed a benchmark comparing RAG simulation against a Semantic Engine, which performs deterministic full-scans in under 200ms. The proposed solution is a QueryRouter that classifies queries into "COMPUTATION" (for the Semantic Engine) or "RETRIEVAL" (for RAG), achieving 9/9 routing accuracy and ensuring exact answers for aggregations.

Key takeaway

For AI Engineers building data Q&A systems, recognize that RAG is fundamentally unsuited for aggregation or computation on structured data. You should implement an intent-based QueryRouter to direct analytical queries, identified by aggregation verbs or numeric comparisons, to a dedicated, deterministic computation engine. This prevents "Error Observability Collapse" where RAG provides confidently wrong answers, ensuring your system delivers accurate results for critical data analysis tasks while RAG handles appropriate retrieval.

Key insights

RAG systems fail at data aggregation, producing confident but incorrect answers, a problem exacerbated by larger context windows.

Principles

RAG is for retrieval, not computation.
Larger context windows increase confidence, not accuracy.
Deterministic computation prevents silent errors.

Method

Implement a QueryRouter to classify queries based on aggregation verbs, numeric comparisons, or retrieval signals. Route computation queries to a deterministic Semantic Engine for full-scan processing, and lookup queries to RAG. Default to computation for ambiguous queries.

In practice

Route "total," "average," "percentage" queries to a computation engine.
Use a regex-based classifier for low latency query routing.
Benchmark RAG for numerical accuracy on your datasets.

Topics

Retrieval-Augmented Generation
Query Routing
Data Aggregation
Semantic Engine
Context Windows
Error Observability

Code references

Emmimal/context-window-engine

Best for: AI Engineer, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.