Scaling AI Knowledge Systems: Lessons from the DHTMLX MCP Server

· Source: Artificial Intelligence in Plain English - Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Advanced, medium

Summary

DHTMLX developed the MCP Server, a centralized knowledge layer that provides AI assistants and developer tools with structured access to current documentation across its product line, including Suite widgets, Gantt, and Scheduler. This system utilizes a Retrieval-Augmented Generation (RAG) approach, which retrieves relevant documentation fragments and feeds them to Large Language Models (LLMs) to generate context-grounded responses. While effective for single products, scaling RAG across multiple distinct products, each with unique documentation, introduced complexity. Initial attempts to use a single vector index led to accuracy issues due to mixed contexts. Separating knowledge into product-specific indexes improved accuracy but necessitated a fast and flexible query routing mechanism. DHTMLX addressed this by developing a custom machine learning model for domain classification, optimized for low latency and high accuracy through distillation and 8-bit quantization, achieving performance comparable to TinyBERT while maintaining a smaller footprint.

Key takeaway

For AI Engineers building RAG systems that span multiple distinct product lines, relying on a single, unified knowledge index will likely degrade answer quality. You should instead segment your knowledge base by product and implement a lightweight, specialized machine learning model for query routing. This approach, leveraging techniques like distillation and quantization, ensures both accuracy and the low latency required for real-time AI assistance, preventing context mixing and improving developer efficiency.

Key insights

Scaling RAG systems for multiple distinct products requires intelligent query routing and optimized knowledge structuring.

Principles

Method

A custom machine learning model for domain classification was developed using distillation and 8-bit quantization to achieve high accuracy and low latency for routing queries to product-specific RAG indexes.

In practice

Topics

Best for: AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence in Plain English - Medium.