Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Medical Devices & Health Technology · Depth: Advanced, medium

Summary

Researchers from the Technical University of Munich and Imperial College London evaluated a token reweighting method to enhance sample efficiency in training vision-language models (VLMs) for medical report generation. This approach modifies the standard cross-entropy loss function to prioritize semantically salient tokens with high clinical importance, such as "multiple drusen" versus "no drusen." Experiments focused on ophthalmological report generation, utilizing a VLM comprising a pretrained image encoder, a Llama3-3B language model, and a projector layer. The study demonstrated that this reweighted loss consistently improved report quality, specifically for AMD staging and biomarker identification, achieving comparable performance with up to ten times less training data across various dataset scales. The diagnostic keyword set, including terms like "healthy" and "fluid," proved particularly effective.

Key takeaway

For AI Scientists developing medical VLMs in data-scarce environments, implementing token reweighting in your loss function can dramatically improve sample efficiency. This method allows you to achieve high report quality, particularly for critical diagnostic tasks like AMD staging and biomarker identification, using significantly less annotated data. Consider curating specific diagnostic keyword sets to maximize the benefit, potentially outperforming models trained on three times more data without reweighting.

Key insights

Token reweighting in VLM training significantly boosts sample efficiency for medical report generation.

Principles

Method

Identify clinically significant tokens using predefined keyword sets (e.g., quantitative, diagnostic). Replace standard cross-entropy loss with a normalized weighted objective $\mathcal{L}_{\text{tw}}(x,\lambda)=-\frac{1}{\Lambda}\sum\limits_{i=1}^{T}\lambda_{i}\log p_{\theta}(x_{i}\mid x_{1<i},I_{\kappa}(x))$ to upweight these tokens.

In practice

Topics

Best for: AI Scientist, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.