SEMA-RAG: A Self-Evolving Multi-Agent Retrieval-Augmented Generation Framework for Medical Reasoning

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, long

Summary

SEMA-RAG, a Self-Evolving Multi-Agent Retrieval-Augmented Generation framework, is proposed for medical question answering to address limitations of static, single-round RAG in clinical reasoning. Standard RAG often suffers from poor question-to-query translation and a lack of iterative sufficiency feedback, leading to unreliable evidence chains. SEMA-RAG decouples these tasks into three specialized agents: an Interpreter Agent for clinical schema interpretation, an Explorer Agent for sufficiency-driven self-evolving retrieval, and an Arbiter Agent for evidence adjudication and answer selection. Evaluated across five medical benchmarks (MMLU-Med, MedQA-US, MedMCQA, PubMedQA*, BioASQ-Y/N) and five LLM backbones (deepseek-v3.1, kimi-k2, qwen3-coder-plus, gemini-2.0-flash, glm-4.0-flash), SEMA-RAG consistently improved the strongest baseline by an average of +6.46 accuracy points per backbone. The framework's multi-agent architecture and iterative evidence exploration are crucial for its performance gains.

Key takeaway

For AI Scientists and Machine Learning Engineers developing RAG systems for high-stakes domains like healthcare, consider adopting a multi-agent, self-evolving framework. Your current single-round RAG approaches may be insufficient for complex reasoning tasks, leading to suboptimal accuracy. Implementing task decoupling and iterative, sufficiency-driven retrieval can significantly enhance evidence chain reliability and overall system performance, as demonstrated by SEMA-RAG's +6.46 accuracy point improvement.

Key insights

Multi-agent RAG with self-evolving, sufficiency-driven retrieval significantly improves medical question answering accuracy.

Principles

Method

SEMA-RAG uses an Interpreter Agent to structure questions into a clinical schema, an Explorer Agent for self-evolving, multi-round retrieval based on evidence sufficiency, and an Arbiter Agent to adjudicate evidence and select answers.

In practice

Topics

Best for: AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.