Efficient and Effective Internal Memory Retrieval for LLM-Based Healthcare Prediction

2026-04-10 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

Keys-to-Knowledge (K2K) is a novel framework designed to enhance the reliability and efficiency of Large Language Models (LLMs) in healthcare prediction tasks by addressing issues like hallucinations and high latency in traditional Retrieval-Augmented Generation (RAG) systems. K2K replaces external knowledge base searches with internal, key-based knowledge access, encoding essential clinical information directly into the LLM's parameter space. This approach enables rapid retrieval from internal key-value memory without inference-time overhead. The framework incorporates three core modules: Internal Memory Construction, Activation-Guided Probe Construction, and Cross-Attentive Reranking to improve retrieval quality and dynamically integrate diverse knowledge. K2K achieved state-of-the-art performance across four benchmark healthcare outcome prediction datasets, including mortality and readmission prediction on MIMIC-III and MIMIC-IV, demonstrating superior efficiency compared to existing RAG and prompt-based methods.

Key takeaway

For AI Engineers developing LLM-based healthcare applications, K2K offers a significant performance and efficiency improvement over traditional RAG. You should consider implementing K2K's internal, key-based knowledge retrieval to mitigate latency and enhance prediction accuracy in time-sensitive clinical settings, especially for tasks like mortality and readmission prediction. This approach allows for faster inference and better contextual grounding without the computational burden of external knowledge bases.

Key insights

K2K enhances LLM healthcare prediction by enabling rapid, internal, key-based knowledge retrieval, bypassing external RAG latency.

Principles

FFN layers implicitly store factual knowledge.
Internal key-value memory can replace external retrieval.
Activation-guided probes improve retrieval accuracy.

Method

K2K constructs internal memory via LoRA, uses Mahalanobis-guided probe construction for query discriminability, and employs cross-attentive reranking to dynamically integrate retrieved document and graph knowledge for final prediction.

In practice

Infuse domain knowledge using LoRA into FFN keys.
Use Mahalanobis distance for robust probe query construction.
Implement cross-attention for dynamic knowledge reranking.

Topics

Keys-to-Knowledge (K2K)
Internal Memory Retrieval
Healthcare Prediction
Retrieval-Augmented Generation
Mahalanobis Distance

Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.