A RAG Chatbot with Incremental Context Retrieval based on Local LLMs for Hospital Documents

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI in Healthcare · Depth: Advanced, quick

Summary

A new RAG-based chatbot, utilizing exclusively local LLMs, has been developed and evaluated for use with internal Portuguese-language documents from a university hospital, including Standard Operating Procedures and technical manuals. The system addresses critical hospital requirements for information security, computational efficiency, and control over sensitive data. The methodology involved evaluating information retrieval quality using dense embedding models and the Mean Reciprocal Rank (MRR) metric. The generation stage was analyzed in two scenarios: fixed context, where multiple chunks are provided simultaneously, and incremental page retrieval, where chunks are sent sequentially based on retrieval ranking. Four local LLMs were tested: MedGemma3:27B, Gemma3:27B, Gpt-oss:20B, and Mistral Small 3.1, with BERTScore measuring generation quality. Results showed that increasing context indiscriminately in the fixed-context scenario degraded generation quality, while incremental page retrieval improved BERTScore values, with MedGemma3:27B performing best.

Key takeaway

For AI Architects and NLP Engineers designing RAG systems for sensitive domains like healthcare, your approach to context management is paramount. Indiscriminately expanding context can degrade output quality, even if it improves retrieval probability. You should prioritize incremental page retrieval techniques to enhance generation quality and ensure computational efficiency, especially when deploying local LLMs like MedGemma3:27B for hospital documents, to maintain data security and control.

Key insights

Adaptive context control is crucial for reliable and efficient RAG systems using local LLMs in healthcare.

Principles

Indiscriminate context increase degrades RAG generation quality.
Sequential context retrieval improves RAG system performance.

Method

The study evaluated RAG generation using fixed context versus incremental page retrieval, measuring retrieval with MRR and generation with BERTScore across four local LLMs.

In practice

Prioritize local LLMs for sensitive hospital data.
Implement incremental context retrieval in RAG systems.
Consider MedGemma3:27B for healthcare RAG applications.

Topics

RAG Chatbot
Local LLMs
Incremental Context Retrieval
Hospital Documents
MedGemma3:27B

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.