Dispatching the Parsed RAG Question: Chunk Strategy, Model Tier, Activations, Audit
Summary
This article details the "dispatch" component of an enterprise RAG system's "question parsing" brick, building on previous discussions of question parsing and extraction. It outlines how a dispatcher uses a document's profile (e.g., doc_type, typical_fields, summary) to make crucial decisions beyond initial question parsing. These decisions include determining the appropriate answer context (detection_context, answer_context, needs_summary), selecting a chunking strategy (sequential vs. combined for LLM calls), and choosing the optimal LLM model tier (nano, mini, standard, reasoning) from a precise registry. The system also dynamically downgrades "activations" (e.g., extract_page_numbers, use_toc_navigation) based on the document's actual properties (e.g., Word vs. PDF). The architecture prioritizes a deterministic dispatcher (Approach B) for reproducibility, auditability, and cost control, rejecting LLM-decided routing for enterprise contexts. An _meta block in the output ensures auditability by recording all routing decisions. Production systems consolidate LLM calls for efficiency, achieving average parsing latency of 280 ms and improving accuracy from 76% to 91% compared to parsing-off.
Key takeaway
For AI Architects designing enterprise RAG systems, prioritize a deterministic question dispatcher over autonomous LLM-driven routing. This approach, which dynamically adjusts chunking strategies, LLM model tiers, and execution activations based on parsed questions and document profiles, ensures critical reproducibility, auditability, and predictable cost. You should implement robust logging of all dispatch decisions in an _meta block to facilitate debugging and compliance, significantly improving answer accuracy and system reliability.
Key insights
Effective RAG dispatch dynamically adapts chunking, model choice, and activations based on parsed questions and document profiles for auditability and cost efficiency.
Principles
- Document profiles inform parsing decisions.
- Deterministic dispatch ensures auditability and cost control.
- Dynamically downgrade activations based on document properties.
Method
The system uses a deterministic dispatcher to resolve chunk_strategy, answer_context, and suggested_model via a cascade (concept-level override > shape/type default > project fallback), and adjusts ExecutionPlan activations based on DocumentProfile.
In practice
- Maintain satellite tables for model tiers and types.
- Log all parsing and dispatch decisions in a _meta block.
- Integrate expert dictionaries for domain-specific terms.
Topics
- RAG Architecture
- Question Parsing
- LLM Dispatch
- Document Intelligence
- Auditability
- Chunking Strategy
Best for: AI Engineer, Machine Learning Engineer, AI Architect
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.