Weighting What Matters: Boosting Sample Efficiency in Medical Report Generation via Token Reweighting
Summary
Researchers from the Technical University of Munich and Imperial College London evaluated a token reweighting method to enhance sample efficiency in training vision-language models (VLMs) for medical report generation. This approach modifies the standard cross-entropy loss function to prioritize semantically salient tokens with high clinical importance, such as "multiple drusen" versus "no drusen." Experiments focused on ophthalmological report generation, utilizing a VLM comprising a pretrained image encoder, a Llama3-3B language model, and a projector layer. The study demonstrated that this reweighted loss consistently improved report quality, specifically for AMD staging and biomarker identification, achieving comparable performance with up to ten times less training data across various dataset scales. The diagnostic keyword set, including terms like "healthy" and "fluid," proved particularly effective.
Key takeaway
For AI Scientists developing medical VLMs in data-scarce environments, implementing token reweighting in your loss function can dramatically improve sample efficiency. This method allows you to achieve high report quality, particularly for critical diagnostic tasks like AMD staging and biomarker identification, using significantly less annotated data. Consider curating specific diagnostic keyword sets to maximize the benefit, potentially outperforming models trained on three times more data without reweighting.
Key insights
Token reweighting in VLM training significantly boosts sample efficiency for medical report generation.
Principles
- Not all tokens in medical reports carry equal clinical relevance.
- Loss functions should reflect semantic importance of tokens.
Method
Identify clinically significant tokens using predefined keyword sets (e.g., quantitative, diagnostic). Replace standard cross-entropy loss with a normalized weighted objective $\mathcal{L}_{\text{tw}}(x,\lambda)=-\frac{1}{\Lambda}\sum\limits_{i=1}^{T}\lambda_{i}\log p_{\theta}(x_{i}\mid x_{1<i},I_{\kappa}(x))$ to upweight these tokens.
In practice
- Use diagnostic keywords for highest impact.
- Apply token reweighting to data-constrained medical domains.
Topics
- Vision-Language Models
- Medical Report Generation
- Sample Efficiency
- Token Reweighting
- Ophthalmology
Best for: AI Scientist, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.