OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models
Summary
OpenMedQ is a new medical vision-language model (VLM) pretrained on the broadest fully-open medical dataset mix to date, comprising 14 datasets with approximately 3.35 million samples across pathology, radiology, microscopy, and text-only clinical QA. This LLaVA-style VLM, built with a ViT-base vision encoder and a LLaMA-7B language model, achieves state-of-the-art performance. It reaches 75.9 BLEU-1 on PathVQA, surpassing Med-PaLM M variants up to 562 billion parameters, and matches the best reported 64.5 BLEU-1 on VQA-MED. Furthermore, its vision encoder, when transferred to 8 unseen medical classification benchmarks, obtains the highest average macro-F1 of 0.757, outperforming BiomedCLIP, PMC-CLIP, PubMedCLIP, and a from-scratch baseline. The code and an interactive demo are publicly available.
Key takeaway
For AI Scientists and Machine Learning Engineers developing medical vision-language models, OpenMedQ demonstrates that broad, open pretraining data can yield superior performance compared to models with significantly more parameters. You should consider utilizing diverse, publicly available medical datasets to build robust VLMs, rather than solely relying on proprietary, large-scale models. Inspect the released code and interactive demo to understand its architecture and adapt its pretraining strategy for your specific medical imaging applications.
Key insights
Broad, open medical pretraining data enables state-of-the-art VLM performance with fewer parameters.
Principles
- Data breadth is a competitive lever for medical VLMs.
- Open-source models can outperform larger proprietary ones.
- Identical downstream recipes isolate pretraining impact.
Method
OpenMedQ uses a BiomedCLIP-initialized ViT-base encoder and PMC-LLaMA-initialized LLaMA-7B, fine-tuned with LoRA (r=8) using next-token cross-entropy on 14 diverse medical datasets.
In practice
- Inspect OpenMedQ's interactive demo for qualitative assessment.
- Reuse OpenMedQ's code and weights for medical VLM development.
- Extend OpenMedQ's architecture with new medical datasets.
Topics
- Medical Vision-Language Models
- Open Science
- Medical Image Classification
- LLaVA
- LLaMA-7B
- PathVQA
Code references
Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.