End-to-End Evaluation of a RAG System for Hospital Documents in Portuguese
Summary
A study evaluated an end-to-end Retrieval-Augmented Generation (RAG) system designed for querying regulatory hospital documents in Portuguese. The research focused on optimizing individual components—retrieval, re-ranking, and generation—within a resource-constrained environment. A hybrid dataset, combining synthetic data with expert validation, was created for the evaluation. Quantitative metrics like MRR, NDCG@10, and BERTScore were used to assess performance. The intfloat/multilingual-e5-small embedding model demonstrated superior robustness in retrieval, achieving a failure rate of only 1.4%. For re-ranking, the Reciprocal Rank Fusion (RRF) method was identified as optimal, balancing computational cost with performance. The final optimized architecture, integrating these components with the Gemini 2.5 Flash generator, provides an efficient and precise solution for decision support in hospital settings.
Key takeaway
For AI Architects and Engineers developing RAG systems for specialized domains like healthcare, prioritizing individual component optimization is crucial. Your choice of embedding model (e.g., intfloat/multilingual-e5-small) and re-ranking method (e.g., RRF) directly impacts system robustness and efficiency, especially in resource-constrained environments. Consider using a hybrid dataset approach for thorough evaluation to ensure practical applicability.
Key insights
Optimizing RAG components individually enhances performance for querying specialized documents in resource-limited settings.
Principles
- Hybrid datasets improve RAG evaluation.
- Component optimization is critical for RAG efficiency.
Method
The methodology involved creating a hybrid dataset (synthetic and expert-validated) and quantitatively evaluating retrieval, re-ranking, and generation components using MRR, NDCG@10, and BERTScore.
In practice
- Use intfloat/multilingual-e5-small for robust embeddings.
- Employ RRF for balanced re-ranking performance.
- Integrate Gemini 2.5 Flash for efficient generation.
Topics
- Retrieval-Augmented Generation
- Hospital Documents
- Portuguese Language Processing
- Embedding Models
- Re-ranking
Best for: AI Architect, AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Paper Index on ACL Anthology.