Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation
Summary
A systematic comparison of four FHIR serialisation strategies for large language models (LLMs) performing medication reconciliation reveals that data format significantly impacts performance. Researchers tested Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline strategies across five open-weight models (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) using 200 synthetic patients, totaling 4,000 inference runs. For models up to 8B parameters, Clinical Narrative significantly outperformed Raw JSON, with Mistral-7B showing a 19 F1 point gain ($r=0.617$, $p<10^{-10}$). This trend reversed for the 70B Llama-3.3 model, where Raw JSON achieved the best mean F1 of 0.9956. Omission was the dominant error mode across all 20 model-strategy combinations, with models more often missing active medications than fabricating them. Smaller models exhibited a capacity ceiling, degrading sharply for patients with more than 7-10 concurrent active medications, while BioMistral-7B, a domain-pretrained model without instruction tuning, produced zero usable output.
Key takeaway
For AI Architects and NLP Engineers deploying LLMs for medication reconciliation, your choice of FHIR data serialisation is critical. If you are using models up to 8B parameters, adopt the Clinical Narrative format to significantly improve performance. For 70B+ parameter models, Raw JSON is the optimal and simpler choice. Always design your clinical safety auditing to focus on detecting omissions, as models are more likely to miss medications than to hallucinate them, especially for polypharmacy patients.
Key insights
FHIR data serialisation significantly impacts LLM medication reconciliation performance, varying by model size.
Principles
- Optimal serialisation strategy is model-size-dependent.
- Omission is the dominant error mode in LLM medication reconciliation.
- Domain pretraining alone is insufficient for structured extraction tasks.
Method
Four FHIR serialisation strategies (Raw JSON, Markdown Table, Clinical Narrative, Chronological Timeline) were systematically compared across five LLMs on a 200-patient synthetic dataset, measuring F1, precision, and recall.
In practice
- Use Clinical Narrative for LLMs up to 8B parameters.
- Use Raw JSON for LLMs 70B parameters and above.
- Prioritize recall tracking in production for medication reconciliation.
Topics
- FHIR Data Serialization
- LLM Medication Reconciliation
- Clinical Narrative Format
- Raw JSON Format
- LLM Performance Evaluation
Code references
Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.