SDR: Set-Distance Rewards for Radiology Report Generation

2026-05-30 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Health & Medical Research · Depth: Expert, quick

Summary

SDR (Set-Distance Rewards) introduces a novel reinforcement learning approach for chest X-ray report generation, addressing the incompatibility of standard exact-match rewards with the unordered nature of radiology findings. This method views reports as sets of sentences, embedded by a frozen sentence transformer, and uses continuous, permutation-invariant set-to-set distances as rewards. Post-training with SDR via GRPO consistently outperformed supervised fine-tuning and exact-match GRPO across two datasets and models like Qwen3-VL-2B/4B and Gemma3-4B, showing average relative improvements of 6.80% on BERTScore, 7.82% on RadGraph F1, and 4.45% on CheXbert F1. SDR also enables test-time best-of-N selection, improving BERTScore by 16.4% over random selection, and supports efficient mid-generation pruning, reducing generated tokens by over 50% while maintaining quality, even with closed-source LLMs like GPT-4o-mini.

Key takeaway

For AI Scientists and Machine Learning Engineers developing medical text generation systems, particularly for radiology reports, you should consider integrating set-distance rewards. This approach offers significant improvements in both training efficacy and inference efficiency, outperforming traditional methods. By adopting this set-based view and its associated reward mechanism, you can enhance report quality and reduce computational overhead during generation, even when working with advanced closed-source LLMs.

Key insights

Set-distance rewards unify post-training and test-time scaling for unordered text generation, like radiology reports.

Principles

Radiology reports are unordered sets of findings.
Set-to-set distances provide continuous, permutation-invariant rewards.
Frozen sentence transformers can embed report sentences effectively.

Method

Split reports into sentences, embed them with a frozen sentence transformer, then use set-to-set distances between generated and reference embeddings as rewards for GRPO post-training.

In practice

Apply set-distance rewards for best-of-N candidate selection.
Implement mid-generation pruning to reduce token output.
Utilize sentence transformers for embedding medical text.

Topics

Radiology Report Generation
Reinforcement Learning
Vision-Language Models
Set-Distance Rewards
Chest X-ray
Sentence Transformers
GRPO

Best for: NLP Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.