Larger Context Windows Don’t Fix RAG — So I Built a System That Does

· Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, long

Summary

A new dataset Q&A system, built for messy CSV files, revealed a critical flaw in Retrieval-Augmented Generation (RAG) pipelines when handling computation queries. Initial testing showed RAG confidently returned less than half the correct total spend of \$1,140,033.24 from a 100,000-row dataset. Increasing context windows from 4k to 128k tokens, and up to 8,000 rows, exacerbated "Error Observability Collapse," making significant errors (over 50% wrong) harder to detect as responses became longer and more authoritative. The author developed a benchmark comparing RAG simulation against a Semantic Engine, which performs deterministic full-scans in under 200ms. The proposed solution is a QueryRouter that classifies queries into "COMPUTATION" (for the Semantic Engine) or "RETRIEVAL" (for RAG), achieving 9/9 routing accuracy and ensuring exact answers for aggregations.

Key takeaway

For AI Engineers building data Q&A systems, recognize that RAG is fundamentally unsuited for aggregation or computation on structured data. You should implement an intent-based QueryRouter to direct analytical queries, identified by aggregation verbs or numeric comparisons, to a dedicated, deterministic computation engine. This prevents "Error Observability Collapse" where RAG provides confidently wrong answers, ensuring your system delivers accurate results for critical data analysis tasks while RAG handles appropriate retrieval.

Key insights

RAG systems fail at data aggregation, producing confident but incorrect answers, a problem exacerbated by larger context windows.

Principles

Method

Implement a QueryRouter to classify queries based on aggregation verbs, numeric comparisons, or retrieval signals. Route computation queries to a deterministic Semantic Engine for full-scan processing, and lookup queries to RAG. Default to computation for ambiguous queries.

In practice

Topics

Code references

Best for: AI Engineer, Machine Learning Engineer, Data Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.