Reinforcement Learning Improves LLM Accuracy and Reasoning in Disease Classification from Radiology Reports

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Medical Natural Language Processing · Depth: Expert, extended

Summary

A new two-stage approach significantly improves disease classification accuracy and reasoning capabilities in lightweight Large Language Models (LLMs) analyzing radiology reports. The method first applies Supervised Fine-Tuning (SFT) using only disease labels, followed by Group Relative Policy Optimization (GRPO) to refine predictions and encourage explicit reasoning without direct reasoning supervision. This framework was evaluated on three radiologist-annotated datasets: MIMIC-CXR, NIH-CXR, and MIDRC, using lightweight LLMs like LLaMA 3.1-8B-Instruct, Qwen 2.5-3B-Instruct, and Phi-3 Min-128K-Instruct. Results show SFT consistently outperformed baselines, and GRPO further boosted micro-F1 scores in eight of nine cohorts, with gains up to 13.2%. GRPO also effectively restored and enhanced reasoning recall and comprehensiveness, which SFT alone often degraded due to catastrophic forgetting.

Key takeaway

For AI Engineers developing clinical NLP solutions, consider integrating a reinforcement learning stage like GRPO after initial supervised fine-tuning. This approach can significantly enhance both classification accuracy and the crucial explainability of lightweight LLMs in radiology, addressing the "catastrophic forgetting" of reasoning paths often seen with SFT alone. Your models will provide more reliable and transparent diagnostic support, fostering greater trust in AI-driven clinical decision-making.

Key insights

Reinforcement learning with GRPO enhances LLM accuracy and reasoning in medical text classification without explicit reasoning supervision.

Principles

Method

A two-stage pipeline: first, SFT on disease labels; second, GRPO with a reward function optimizing classification accuracy and output format, including explicit reasoning. Majority voting and summarization consolidate multiple inferences.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.