ArogyaSutra: A Multi-Agent Framework for Multimodal Medical Reasoning in Indic Languages

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, AI in Healthcare · Depth: Expert, extended

Summary

ArogyaSutra is an actor-critic-based multi-agent framework designed for multilingual multimodal medical reasoning in Indic languages, addressing limitations of English-centric MLLMs in low-resource healthcare settings. It integrates tool grounding with dual-memory mechanisms for step-wise, reasoning-aware decision-making. The framework is supported by ArogyaBodha, a large-scale multilingual multimodal medical question-answer dataset. ArogyaBodha comprises 40,857 samples across English and seven major Indian languages (Bengali, Hindi, Assamese, Tamil, Telugu, Punjabi, Marathi), covering 31 body systems and 21 clinical domains. Experiments show ArogyaSutra, using a Qwen2.5-VL-7B backbone, achieves an average accuracy of 43.40%, outperforming GPT-4.0 (39.30%) by +4.1 points and its base model by +9.2 points, demonstrating improved reasoning and multilingual alignment.

Key takeaway

For AI Scientists developing healthcare MLLMs for diverse linguistic populations, ArogyaSutra offers a robust framework to improve reasoning and language fidelity. You should consider integrating actor-critic architectures with tool-grounded perception and dual-memory mechanisms to handle complex multimodal queries in low-resource languages. This approach can significantly enhance diagnostic explanation trustworthiness and equitable access to AI-driven healthcare assistance, especially in regions like rural India.

Key insights

Multimodal medical reasoning in Indic languages is enhanced by an actor-critic framework with tool grounding and dual-memory.

Principles

Method

ArogyaSutra's Actor processes inputs, invokes visual grounding tools, and predicts reasoning steps. The Critic evaluates outputs, provides corrective feedback (English for linguistic, Indic for logical), and updates dual-memory for iterative refinement.

In practice

Topics

Best for: AI Scientist, Research Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.