Dispatching the Parsed RAG Question: Chunk Strategy, Model Tier, Activations, Audit

2026-06-18 · Source: Towards Data Science · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, extended

Summary

This article details the "dispatch" component of an enterprise RAG system's "question parsing" brick, building on previous discussions of question parsing and extraction. It outlines how a dispatcher uses a document's profile (e.g., doc_type, typical_fields, summary) to make crucial decisions beyond initial question parsing. These decisions include determining the appropriate answer context (detection_context, answer_context, needs_summary), selecting a chunking strategy (sequential vs. combined for LLM calls), and choosing the optimal LLM model tier (nano, mini, standard, reasoning) from a precise registry. The system also dynamically downgrades "activations" (e.g., extract_page_numbers, use_toc_navigation) based on the document's actual properties (e.g., Word vs. PDF). The architecture prioritizes a deterministic dispatcher (Approach B) for reproducibility, auditability, and cost control, rejecting LLM-decided routing for enterprise contexts. An _meta block in the output ensures auditability by recording all routing decisions. Production systems consolidate LLM calls for efficiency, achieving average parsing latency of 280 ms and improving accuracy from 76% to 91% compared to parsing-off.

Key takeaway

For AI Architects designing enterprise RAG systems, prioritize a deterministic question dispatcher over autonomous LLM-driven routing. This approach, which dynamically adjusts chunking strategies, LLM model tiers, and execution activations based on parsed questions and document profiles, ensures critical reproducibility, auditability, and predictable cost. You should implement robust logging of all dispatch decisions in an _meta block to facilitate debugging and compliance, significantly improving answer accuracy and system reliability.

Key insights

Effective RAG dispatch dynamically adapts chunking, model choice, and activations based on parsed questions and document profiles for auditability and cost efficiency.

Principles

Document profiles inform parsing decisions.
Deterministic dispatch ensures auditability and cost control.
Dynamically downgrade activations based on document properties.

Method

The system uses a deterministic dispatcher to resolve chunk_strategy, answer_context, and suggested_model via a cascade (concept-level override > shape/type default > project fallback), and adjusts ExecutionPlan activations based on DocumentProfile.

In practice

Maintain satellite tables for model tiers and types.
Log all parsing and dispatch decisions in a _meta block.
Integrate expert dictionaries for domain-specific terms.

Topics

RAG Architecture
Question Parsing
LLM Dispatch
Document Intelligence
Auditability
Chunking Strategy

Best for: AI Engineer, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards Data Science.