OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

2026-06-12 · Source: cs.AI updates on arXiv.org · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Advanced, short

Summary

OpenMedQ is a new medical vision-language model (VLM) pretrained on the broadest fully-open medical dataset mix to date, comprising 14 datasets with approximately 3.35 million samples across pathology, radiology, microscopy, and text-only clinical QA. This LLaVA-style VLM, built with a ViT-base vision encoder and a LLaMA-7B language model, achieves state-of-the-art performance. It reaches 75.9 BLEU-1 on PathVQA, surpassing Med-PaLM M variants up to 562 billion parameters, and matches the best reported 64.5 BLEU-1 on VQA-MED. Furthermore, its vision encoder, when transferred to 8 unseen medical classification benchmarks, obtains the highest average macro-F1 of 0.757, outperforming BiomedCLIP, PMC-CLIP, PubMedCLIP, and a from-scratch baseline. The code and an interactive demo are publicly available.

Key takeaway

For AI Scientists and Machine Learning Engineers developing medical vision-language models, OpenMedQ demonstrates that broad, open pretraining data can yield superior performance compared to models with significantly more parameters. You should consider utilizing diverse, publicly available medical datasets to build robust VLMs, rather than solely relying on proprietary, large-scale models. Inspect the released code and interactive demo to understand its architecture and adapt its pretraining strategy for your specific medical imaging applications.

Key insights

Broad, open medical pretraining data enables state-of-the-art VLM performance with fewer parameters.

Principles

Data breadth is a competitive lever for medical VLMs.
Open-source models can outperform larger proprietary ones.
Identical downstream recipes isolate pretraining impact.

Method

OpenMedQ uses a BiomedCLIP-initialized ViT-base encoder and PMC-LLaMA-initialized LLaMA-7B, fine-tuned with LoRA (r=8) using next-token cross-entropy on 14 diverse medical datasets.

In practice

Inspect OpenMedQ's interactive demo for qualitative assessment.
Reuse OpenMedQ's code and weights for medical VLM development.
Extend OpenMedQ's architecture with new medical datasets.

Topics

Medical Vision-Language Models
Open Science
Medical Image Classification
LLaVA
LLaMA-7B
PathVQA

Code references

gevaertlab/OpenMedQ

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.