A RAG Chatbot with Incremental Context Retrieval based on Local LLMs for Hospital Documents
Summary
A new RAG-based chatbot, utilizing exclusively local LLMs, has been developed and evaluated for use with internal Portuguese-language documents from a university hospital, including Standard Operating Procedures and technical manuals. The system addresses critical hospital requirements for information security, computational efficiency, and control over sensitive data. The methodology involved evaluating information retrieval quality using dense embedding models and the Mean Reciprocal Rank (MRR) metric. The generation stage was analyzed in two scenarios: fixed context, where multiple chunks are provided simultaneously, and incremental page retrieval, where chunks are sent sequentially based on retrieval ranking. Four local LLMs were tested: MedGemma3:27B, Gemma3:27B, Gpt-oss:20B, and Mistral Small 3.1, with BERTScore measuring generation quality. Results showed that increasing context indiscriminately in the fixed-context scenario degraded generation quality, while incremental page retrieval improved BERTScore values, with MedGemma3:27B performing best.
Key takeaway
For AI Architects and NLP Engineers designing RAG systems for sensitive domains like healthcare, your approach to context management is paramount. Indiscriminately expanding context can degrade output quality, even if it improves retrieval probability. You should prioritize incremental page retrieval techniques to enhance generation quality and ensure computational efficiency, especially when deploying local LLMs like MedGemma3:27B for hospital documents, to maintain data security and control.
Key insights
Adaptive context control is crucial for reliable and efficient RAG systems using local LLMs in healthcare.
Principles
- Indiscriminate context increase degrades RAG generation quality.
- Sequential context retrieval improves RAG system performance.
Method
The study evaluated RAG generation using fixed context versus incremental page retrieval, measuring retrieval with MRR and generation with BERTScore across four local LLMs.
In practice
- Prioritize local LLMs for sensitive hospital data.
- Implement incremental context retrieval in RAG systems.
- Consider MedGemma3:27B for healthcare RAG applications.
Topics
- RAG Chatbot
- Local LLMs
- Incremental Context Retrieval
- Hospital Documents
- MedGemma3:27B
Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.