OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

· Source: cs.AI updates on arXiv.org · Field: Health & Wellbeing — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Advanced, short

Summary

OpenMedQ is a new medical vision-language model (VLM) pretrained on the broadest fully-open medical dataset mix to date, comprising 14 datasets with approximately 3.35 million samples across pathology, radiology, microscopy, and text-only clinical QA. This LLaVA-style VLM, built with a ViT-base vision encoder and a LLaMA-7B language model, achieves state-of-the-art performance. It reaches 75.9 BLEU-1 on PathVQA, surpassing Med-PaLM M variants up to 562 billion parameters, and matches the best reported 64.5 BLEU-1 on VQA-MED. Furthermore, its vision encoder, when transferred to 8 unseen medical classification benchmarks, obtains the highest average macro-F1 of 0.757, outperforming BiomedCLIP, PMC-CLIP, PubMedCLIP, and a from-scratch baseline. The code and an interactive demo are publicly available.

Key takeaway

For AI Scientists and Machine Learning Engineers developing medical vision-language models, OpenMedQ demonstrates that broad, open pretraining data can yield superior performance compared to models with significantly more parameters. You should consider utilizing diverse, publicly available medical datasets to build robust VLMs, rather than solely relying on proprietary, large-scale models. Inspect the released code and interactive demo to understand its architecture and adapt its pretraining strategy for your specific medical imaging applications.

Key insights

Broad, open medical pretraining data enables state-of-the-art VLM performance with fewer parameters.

Principles

Method

OpenMedQ uses a BiomedCLIP-initialized ViT-base encoder and PMC-LLaMA-initialized LLaMA-7B, fine-tuned with LoRA (r=8) using next-token cross-entropy on 14 diverse medical datasets.

In practice

Topics

Code references

Best for: Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.