How Good LLMs Are at Answering Bangla Medical Visual Questions? Dataset and Benchmarking
Summary
A new dataset, BanglaMedVQA, has been introduced to benchmark Large Language Models (LLMs) and Large Vision Language Models (LVLMs) on medical visual question answering (MedVQA) in Bangla. This dataset consists of clinically validated image-question-answer pairs, addressing a significant gap for one of the world's most widely spoken languages. Initial evaluations of current foundation models, including Gemini and GPT-4.1 mini, on BanglaMedVQA reveal substantially lower performance compared to English MedVQA benchmarks. Models struggle with specialized diagnostic questions and fine-grained medical reasoning, indicating severe limitations in low-resource language contexts. While some open-source models like Gemma-3 occasionally show better general performance, they also fail on clinically complex questions, highlighting the need for improved evaluation methods and model capabilities.
Key takeaway
For AI Scientists and Machine Learning Engineers developing medical AI, the BanglaMedVQA dataset highlights critical performance gaps in low-resource languages. Your current foundation models, even top-tier ones like Gemini and GPT-4.1 mini, are likely insufficient for accurate clinical reasoning in Bangla. Prioritize research into language-specific fine-tuning and advanced reasoning architectures to address these limitations and ensure equitable access to medical AI.
Key insights
BanglaMedVQA dataset reveals current LLMs and LVLMs perform poorly on medical visual questions in Bangla.
Principles
- Low-resource languages pose significant challenges for MedVQA.
- Current models lack fine-grained medical reasoning capabilities.
In practice
- Develop specialized models for low-resource medical VQA.
- Focus on improving fine-grained medical reasoning.
Topics
- BanglaMedVQA Dataset
- Medical Visual Question Answering
- Large Vision Language Models
- Low-Resource Language Performance
- Clinical Reasoning
Best for: AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.