Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines
Summary
Researchers adapted Qwen2.5-14B-Instruct to Brazilian clinical guidelines, addressing a gap in LLM performance on the country's Unified Health System (SUS) protocols. They generated approximately 70 million tokens of synthetic data from 178 official guidelines (5.4M tokens) using four generator LLMs, creating rephrases, wiki-style articles, and question-answer pairs. This data was used for continual pre-training, followed by Group Relative Policy Optimization (GRPO). The team introduced two new benchmarks: HealthBench-BR, with 1,780 true/false clinical assertions, and PCDT-QA, with 890 open-ended clinical questions. Their best model achieved 83.9% on HealthBench-BR and 85.4% on PCDT-QA, outperforming larger models like GPT-5.2, Claude Sonnet 4.6, Gemini 3.1 Pro, and Google AI Overview's RAG system. Ablation studies confirmed the importance of generator diversity and reinforcement learning for these performance gains.
Key takeaway
For AI Engineers developing clinical decision support systems for non-English healthcare contexts, this research demonstrates that targeted adaptation of open LLMs can surpass larger, general-purpose models. You should prioritize generating diverse, multi-format synthetic data from official guidelines and consider reinforcement learning to improve factual accuracy and mitigate sycophancy, especially when dealing with critical medical information. This approach offers a path to deploy smaller, more transparent models with superior domain-specific knowledge.
Key insights
Domain-specific synthetic data and reinforcement learning significantly enhance LLM performance on specialized clinical knowledge.
Principles
- Diverse synthetic data formats improve knowledge absorption.
- Reinforcement learning can correct model biases like sycophancy.
- Full fine-tuning often outperforms LoRA for domain adaptation.
Method
The method involves generating multi-format synthetic data from clinical guidelines using diverse LLM generators, followed by continual pre-training and reinforcement learning with Group Relative Policy Optimization (GRPO).
In practice
- Use multiple generator LLMs for synthetic data diversity.
- Combine rephrased text, wiki-style articles, and QA pairs.
- Apply GRPO for factual verification and bias correction.
Topics
- Brazilian Clinical Guidelines
- LLM Domain Adaptation
- Synthetic Data Generation
- Continual Pre-training
- Group Relative Policy Optimization
Code references
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.