OpenMedQ: Broad Open Pretraining for Medical Vision-Language Models

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision & Pattern Recognition, Health & Medical Research · Depth: Expert, quick

Summary

OpenMedQ is a new medical vision-language model pretrained on the broadest fully-open medical dataset to date, comprising 14 datasets with approximately 3.35 million samples. This diverse pretraining mix spans pathology, radiology, microscopy, and text-only clinical QA. The model achieves state-of-the-art performance, reaching a BLEU-1 score of 75.9 on PathVQA, surpassing Med-PaLM M variants up to 562 billion parameters, which are about 80 times larger. OpenMedQ also matches the best reported VQA-MED BLEU-1 score of 64.5. Furthermore, its vision encoder, when transferred to eight unseen medical classification benchmarks using an identical downstream recipe, obtains the highest average macro-F1 score of 0.757, outperforming BiomedCLIP (0.745), PMC-CLIP (0.745), PubMedCLIP (0.746), and a from-scratch baseline (0.616). The code and an interactive demo are publicly available.

Key takeaway

For AI Scientists and Machine Learning Engineers developing medical vision-language models, OpenMedQ demonstrates that broad, open-source pretraining on diverse medical data can yield state-of-the-art results, even surpassing much larger models. You should consider adopting similar broad pretraining strategies and leveraging OpenMedQ's released code and demo as a strong baseline for your own projects, potentially reducing computational costs while achieving high performance.

Key insights

Broad open pretraining on diverse medical data enables SOTA performance with smaller models.

Principles

Method

Pretraining a medical VLM on 14 diverse datasets (~3.35M samples) covering pathology, radiology, microscopy, and clinical QA.

In practice

Topics

Best for: AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.