ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages
Summary
ArogyaSutra is an actor-critic-based multi-agent framework designed for multilingual multimodal medical reasoning in Indic languages, addressing limitations of English-centric MLLMs in low-resource healthcare settings. It integrates tool grounding with dual-memory mechanisms for step-wise, reasoning-aware decision-making. The framework is supported by ArogyaBodha, a large-scale multilingual multimodal medical question-answer dataset. ArogyaBodha comprises 40,857 samples across English and seven major Indian languages (Bengali, Hindi, Assamese, Tamil, Telugu, Punjabi, Marathi), covering 31 body systems and 21 clinical domains. Experiments show ArogyaSutra, using a Qwen2.5-VL-7B backbone, achieves an average accuracy of 43.40%, outperforming GPT-4.0 (39.30%) by +4.1 points and its base model by +9.2 points, demonstrating improved reasoning and multilingual alignment.
Key takeaway
For AI Scientists developing healthcare MLLMs for diverse linguistic populations, ArogyaSutra offers a robust framework to improve reasoning and language fidelity. You should consider integrating actor-critic architectures with tool-grounded perception and dual-memory mechanisms to handle complex multimodal queries in low-resource languages. This approach can significantly enhance diagnostic explanation trustworthiness and equitable access to AI-driven healthcare assistance, especially in regions like rural India.
Key insights
Multimodal medical reasoning in Indic languages is enhanced by an actor-critic framework with tool grounding and dual-memory.
Principles
- Actor-critic frameworks enhance step-wise reasoning.
- Dual-memory mechanisms track and correct errors.
- Tool grounding extracts clinically relevant visual evidence.
Method
ArogyaSutra's Actor processes inputs, invokes visual grounding tools, and predicts reasoning steps. The Critic evaluates outputs, provides corrective feedback (English for linguistic, Indic for logical), and updates dual-memory for iterative refinement.
In practice
- Utilize ArogyaBodha for multilingual medical QA.
- Implement tool-based visual grounding for MLLMs.
- Apply actor-critic with memory for iterative refinement.
Topics
- Multimodal Medical Reasoning
- Indic Languages
- Multi-Agent Frameworks
- Actor-Critic Learning
- Visual Tool Grounding
- ArogyaBodha Dataset
Best for: AI Scientist, Research Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.