SDR: Set-Distance Rewards for Radiology Report Generation
Summary
SDR (Set-Distance Rewards) introduces a novel reinforcement learning approach for chest X-ray report generation, addressing the incompatibility of standard exact-match rewards with the unordered nature of radiology findings. This method views reports as sets of sentences, embedded by a frozen sentence transformer, and uses continuous, permutation-invariant set-to-set distances as rewards. Post-training with SDR via GRPO consistently outperformed supervised fine-tuning and exact-match GRPO across two datasets and models like Qwen3-VL-2B/4B and Gemma3-4B, showing average relative improvements of 6.80% on BERTScore, 7.82% on RadGraph F1, and 4.45% on CheXbert F1. SDR also enables test-time best-of-N selection, improving BERTScore by 16.4% over random selection, and supports efficient mid-generation pruning, reducing generated tokens by over 50% while maintaining quality, even with closed-source LLMs like GPT-4o-mini.
Key takeaway
For AI Scientists and Machine Learning Engineers developing medical text generation systems, particularly for radiology reports, you should consider integrating set-distance rewards. This approach offers significant improvements in both training efficacy and inference efficiency, outperforming traditional methods. By adopting this set-based view and its associated reward mechanism, you can enhance report quality and reduce computational overhead during generation, even when working with advanced closed-source LLMs.
Key insights
Set-distance rewards unify post-training and test-time scaling for unordered text generation, like radiology reports.
Principles
- Radiology reports are unordered sets of findings.
- Set-to-set distances provide continuous, permutation-invariant rewards.
- Frozen sentence transformers can embed report sentences effectively.
Method
Split reports into sentences, embed them with a frozen sentence transformer, then use set-to-set distances between generated and reference embeddings as rewards for GRPO post-training.
In practice
- Apply set-distance rewards for best-of-N candidate selection.
- Implement mid-generation pruning to reduce token output.
- Utilize sentence transformers for embedding medical text.
Topics
- Radiology Report Generation
- Reinforcement Learning
- Vision-Language Models
- Set-Distance Rewards
- Chest X-ray
- Sentence Transformers
- GRPO
Best for: NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.