Context Windows Aren't Enough: Why RAG Matters for High-Stakes AI

2026-06-09 · Source: The TWIML AI Podcast with Sam Charrington · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

Sphere's Tax Review and Assessment Model (TRAM) is an internal AI system designed to automate sales tax compliance across numerous US and international jurisdictions. Addressing the complexity of constantly evolving, granular tax legislation, TRAM enables Sphere's tax experts to operate two orders of magnitude faster with fewer errors. This significantly outperforms traditional manual methods. The system relies heavily on Retrieval Augmented Generation (RAG), which remains critical for achieving high accuracy and providing exact citations. This is true even as context windows expand. TRAM's pipeline ingests diverse legal documents, performs English translation, and semantically chunks content while preserving hierarchy. It then generates both dense and sparse embeddings, stored in a vector database. Human tax experts review TRAM's determinations, reasoning, and citations. Their feedback directly informs Reinforcement Fine-Tuning (RFT) of OpenAI models to continuously enhance accuracy.

Key takeaway

AI/ML Engineers building high-stakes, auditable systems, where accuracy and citation are critical, should not view Retrieval Augmented Generation (RAG) as obsolete. Your focus should be on sophisticated RAG implementations, including semantic chunking and combining dense/sparse embeddings, to ensure precision. Integrate human expert feedback for Reinforcement Fine-Tuning (RFT) to achieve the highest accuracy. This approach significantly outperforms relying solely on larger context windows.

Key insights

RAG is indispensable for high-stakes, citation-sensitive domains, even with large context windows, due to its accuracy and explainability.

Principles

Accuracy and citation are paramount in legal AI.
Semantic chunking outperforms naive methods for structured documents.
Combine dense and sparse embeddings for robust retrieval.

Method

The TRAM pipeline ingests diverse legal documents, translates, semantically chunks, generates dense and sparse embeddings, and stores them for query-driven retrieval and LLM-based determination.

In practice

Implement semantic chunking tailored to document structure.
Utilize both dense and sparse embeddings for retrieval.
Integrate human expert feedback for RFT to boost accuracy.

Topics

Retrieval-Augmented Generation
Sales Tax Compliance
Large Language Models
Reinforcement Fine-Tuning
Semantic Chunking
Legal AI

Best for: AI Architect, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by The TWIML AI Podcast with Sam Charrington.