Please don’t trust your chatbot for medical advice
Summary
Recent studies highlight significant limitations and risks associated with using large language models (LLMs) for medical advice, particularly when accessed by the general public. A BMJ study involving Gemini, DeepSeek, Meta AI, ChatGPT, and Grok found nearly half of responses to medical questions were problematic, exhibiting hallucinations, fabricated citations, and overconfidence. Separately, research in JAMA Network Open assessed 21 frontier models, concluding they remain limited in early diagnostic reasoning and are unreliable for unsupervised patient-facing clinical decision-making. Two additional Nature Medicine studies reinforced these concerns, showing LLMs identified relevant conditions in fewer than 34.5% of cases and undertriaged 52% of gold-standard emergencies, raising critical safety issues for consumer-scale deployment.
Key takeaway
For healthcare providers and AI developers considering integrating LLMs into patient-facing applications, these converging studies underscore a critical need for caution. You should prioritize rigorous validation and robust human oversight, especially for diagnostic reasoning and triage systems, to prevent amplifying misinformation and ensure patient safety. Do not deploy consumer-scale AI triage without prospective validation of safety concerns.
Key insights
LLMs are unreliable for medical advice, frequently generating misinformation with overconfidence and lacking clinical reasoning.
Principles
- LLMs are "frequently wrong, never in doubt."
- LLM outputs are consistently expressed with confidence.
- Patients struggle to guide LLMs effectively.
In practice
- Avoid using LLMs for unsupervised patient-facing clinical decisions.
- Educate the public on LLM limitations in healthcare.
- Implement crisis safeguards before AI triage system deployment.
Topics
- Large Language Models
- Medical Misinformation
- Clinical Reasoning
- Patient Safety
- Diagnostic Limitations
Best for: CTO, VP of Engineering/Data, Director of AI/ML, General Interest, AI Ethicist, Policy Maker
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Marcus on AI.