From Conflict to Consensus: Boosting Medical Reasoning via Multi-Round Agentic RAG

2026-03-05 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Advanced, extended

Summary

MA-RAG (Multi-Round Agentic RAG) is introduced to enhance medical reasoning in Large Language Models (LLMs) by mitigating issues like hallucinations and outdated knowledge, which traditional Retrieval-Augmented Generation (RAG) methods struggle with due to noisy token-level signals and lack of multi-round refinement. This framework employs an agentic refinement loop that iteratively evolves both external evidence and internal reasoning history, transforming semantic conflict among candidate responses into actionable retrieval queries. MA-RAG comprises a Solver Agent for generating diverse candidates, a Retrieval Agent for conflict-guided evidence fetching, and a Ranking Agent to optimize history reasoning traces, thereby addressing "long-context degradation" and extending the "self-consistency principle" by using inconsistency as a proactive signal. Evaluated across seven diverse medical Q&A benchmarks, MA-RAG consistently outperformed competitive inference-time scaling and RAG baselines, achieving a substantial +6.8 points average accuracy improvement over the Llama-2-7B-Chat backbone model, reaching 69.7%. The framework demonstrated particular effectiveness on complex tasks like MedXpertQA, showing a 37% improvement over the backbone, with ablation studies confirming the critical contributions of its agentic components and multi-round refinement.

Key takeaway

MA-RAG, a Multi-Round Agentic RAG framework, significantly boosts LLM medical reasoning by iteratively refining responses and external evidence. It leverages semantic conflict among candidate answers to guide targeted retrieval and optimizes reasoning history, achieving a +6.8 average accuracy improvement over backbone models and a 37% gain on complex MedXpertQA. This approach enhances reliability by mitigating hallucinations and long-context degradation, crucial for safety-critical medical AI.

Topics

Medical Reasoning
Retrieval-Augmented Generation
Agentic AI
Multi-Round Refinement
Test-Time Scaling

Code references

NJU-RL/MA-RAG

Best for: Research Scientist, AI Researcher, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.