End-to-End Evaluation of a RAG System for Hospital Documents in Portuguese

2026-04-12 · Source: Paper Index on ACL Anthology · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, quick

Summary

A study evaluated an end-to-end Retrieval-Augmented Generation (RAG) system designed for querying regulatory hospital documents in Portuguese. The research focused on optimizing individual components—retrieval, re-ranking, and generation—within a resource-constrained environment. A hybrid dataset, combining synthetic data with expert validation, was created for the evaluation. Quantitative metrics like MRR, NDCG@10, and BERTScore were used to assess performance. The intfloat/multilingual-e5-small embedding model demonstrated superior robustness in retrieval, achieving a failure rate of only 1.4%. For re-ranking, the Reciprocal Rank Fusion (RRF) method was identified as optimal, balancing computational cost with performance. The final optimized architecture, integrating these components with the Gemini 2.5 Flash generator, provides an efficient and precise solution for decision support in hospital settings.

Key takeaway

For AI Architects and Engineers developing RAG systems for specialized domains like healthcare, prioritizing individual component optimization is crucial. Your choice of embedding model (e.g., intfloat/multilingual-e5-small) and re-ranking method (e.g., RRF) directly impacts system robustness and efficiency, especially in resource-constrained environments. Consider using a hybrid dataset approach for thorough evaluation to ensure practical applicability.

Key insights

Optimizing RAG components individually enhances performance for querying specialized documents in resource-limited settings.

Principles

Hybrid datasets improve RAG evaluation.
Component optimization is critical for RAG efficiency.

Method

The methodology involved creating a hybrid dataset (synthetic and expert-validated) and quantitatively evaluating retrieval, re-ranking, and generation components using MRR, NDCG@10, and BERTScore.

In practice

Use intfloat/multilingual-e5-small for robust embeddings.
Employ RRF for balanced re-ranking performance.
Integrate Gemini 2.5 Flash for efficient generation.

Topics

Retrieval-Augmented Generation
Hospital Documents
Portuguese Language Processing
Embedding Models
Re-ranking

Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.