Serialisation Strategy Matters: How FHIR Data Format Affects LLM Medication Reconciliation

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Clinical Informatics · Depth: Advanced, extended

Summary

A systematic comparison of four FHIR serialisation strategies for large language models (LLMs) performing medication reconciliation reveals that data format significantly impacts performance. Researchers tested Raw JSON, Markdown Table, Clinical Narrative, and Chronological Timeline strategies across five open-weight models (Phi-3.5-mini, Mistral-7B, BioMistral-7B, Llama-3.1-8B, Llama-3.3-70B) using 200 synthetic patients, totaling 4,000 inference runs. For models up to 8B parameters, Clinical Narrative significantly outperformed Raw JSON, with Mistral-7B showing a 19 F1 point gain ($r=0.617$, $p<10^{-10}$). This trend reversed for the 70B Llama-3.3 model, where Raw JSON achieved the best mean F1 of 0.9956. Omission was the dominant error mode across all 20 model-strategy combinations, with models more often missing active medications than fabricating them. Smaller models exhibited a capacity ceiling, degrading sharply for patients with more than 7-10 concurrent active medications, while BioMistral-7B, a domain-pretrained model without instruction tuning, produced zero usable output.

Key takeaway

For AI Architects and NLP Engineers deploying LLMs for medication reconciliation, your choice of FHIR data serialisation is critical. If you are using models up to 8B parameters, adopt the Clinical Narrative format to significantly improve performance. For 70B+ parameter models, Raw JSON is the optimal and simpler choice. Always design your clinical safety auditing to focus on detecting omissions, as models are more likely to miss medications than to hallucinate them, especially for polypharmacy patients.

Key insights

FHIR data serialisation significantly impacts LLM medication reconciliation performance, varying by model size.

Principles

Method

Four FHIR serialisation strategies (Raw JSON, Markdown Table, Clinical Narrative, Chronological Timeline) were systematically compared across five LLMs on a 200-patient synthetic dataset, measuring F1, precision, and recall.

In practice

Topics

Code references

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.