Teaching LLMs Brazilian Healthcare: Injecting Knowledge from Official Clinical Guidelines

2026-05-05 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

Researchers adapted Qwen2.5-14B-Instruct to Brazilian clinical guidelines, addressing a gap in LLM performance on the country's Unified Health System (SUS) protocols. They generated approximately 70 million tokens of synthetic data from 178 official guidelines (5.4M tokens) using four generator LLMs, creating rephrases, wiki-style articles, and question-answer pairs. This data was used for continual pre-training, followed by Group Relative Policy Optimization (GRPO). The team introduced two new benchmarks: HealthBench-BR, with 1,780 true/false clinical assertions, and PCDT-QA, with 890 open-ended clinical questions. Their best model achieved 83.9% on HealthBench-BR and 85.4% on PCDT-QA, outperforming larger models like GPT-5.2, Claude Sonnet 4.6, Gemini 3.1 Pro, and Google AI Overview's RAG system. Ablation studies confirmed the importance of generator diversity and reinforcement learning for these performance gains.

Key takeaway

For AI Engineers developing clinical decision support systems for non-English healthcare contexts, this research demonstrates that targeted adaptation of open LLMs can surpass larger, general-purpose models. You should prioritize generating diverse, multi-format synthetic data from official guidelines and consider reinforcement learning to improve factual accuracy and mitigate sycophancy, especially when dealing with critical medical information. This approach offers a path to deploy smaller, more transparent models with superior domain-specific knowledge.

Key insights

Domain-specific synthetic data and reinforcement learning significantly enhance LLM performance on specialized clinical knowledge.

Principles

Diverse synthetic data formats improve knowledge absorption.
Reinforcement learning can correct model biases like sycophancy.
Full fine-tuning often outperforms LoRA for domain adaptation.

Method

The method involves generating multi-format synthetic data from clinical guidelines using diverse LLM generators, followed by continual pre-training and reinforcement learning with Group Relative Policy Optimization (GRPO).

In practice

Use multiple generator LLMs for synthetic data diversity.
Combine rephrased text, wiki-style articles, and QA pairs.
Apply GRPO for factual verification and bias correction.

Topics

Brazilian Clinical Guidelines
LLM Domain Adaptation
Synthetic Data Generation
Continual Pre-training
Group Relative Policy Optimization

Code references

hugoabonizio/clinical-protocols-br

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.