Scaling AI Knowledge Systems: Lessons from the DHTMLX MCP Server

2026-03-19 · Source: Artificial Intelligence in Plain English - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

DHTMLX developed the MCP Server, a centralized knowledge layer that provides AI assistants and developer tools with structured access to current documentation across its product line, including Suite widgets, Gantt, and Scheduler. This system utilizes a Retrieval-Augmented Generation (RAG) approach, which retrieves relevant documentation fragments and feeds them to Large Language Models (LLMs) to generate context-grounded responses. While effective for single products, scaling RAG across multiple distinct products, each with unique documentation, introduced complexity. Initial attempts to use a single vector index led to accuracy issues due to mixed contexts. Separating knowledge into product-specific indexes improved accuracy but necessitated a fast and flexible query routing mechanism. DHTMLX addressed this by developing a custom machine learning model for domain classification, optimized for low latency and high accuracy through distillation and 8-bit quantization, achieving performance comparable to TinyBERT while maintaining a smaller footprint.

Key takeaway

For AI Engineers building RAG systems that span multiple distinct product lines, relying on a single, unified knowledge index will likely degrade answer quality. You should instead segment your knowledge base by product and implement a lightweight, specialized machine learning model for query routing. This approach, leveraging techniques like distillation and quantization, ensures both accuracy and the low latency required for real-time AI assistance, preventing context mixing and improving developer efficiency.

Key insights

Scaling RAG systems for multiple distinct products requires intelligent query routing and optimized knowledge structuring.

Principles

Separate knowledge bases for distinct product domains.
Balance model size and accuracy for real-time routing.

Method

A custom machine learning model for domain classification was developed using distillation and 8-bit quantization to achieve high accuracy and low latency for routing queries to product-specific RAG indexes.

In practice

Implement product-specific vector indexes.
Use distillation for smaller, capable models.
Apply 8-bit quantization to reduce model size.

Topics

Retrieval-Augmented Generation
Model Context Protocol
Machine Learning Models
Model Distillation
Quantization

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence in Plain English - Medium.