Americans ask AI for health care. Hospitals think the answer is more chatbots.
Summary
Many Americans are using large language models (LLMs) for health advice, prompting health systems to develop and deploy their own branded chatbots. For example, K Health and Hartford HealthCare are rolling out PatientGPT to tens of thousands of existing patients, aiming to provide a safer, more convenient alternative to commercial AI tools. Similarly, Epic is introducing Emmie, an AI chat assistant integrated into MyChart, with Sutter Health and Reid Health adopting it. While proponents emphasize convenience and digital equity, experts raise concerns about chatbot readiness, monitoring, liability, and the lack of evidence demonstrating improved patient outcomes. A Nature Medicine study found LLMs correctly identified medical conditions 95% of the time with expert prompts but only 33% with user prompts, highlighting a disconnect between benchmark scores and real-world performance. The US healthcare system already faces challenges like lower life expectancy and limited access to care, with 1 in 3 adults using AI for health information, often due to cost or lack of a primary care provider.
Key takeaway
For healthcare executives considering AI chatbot integration, prioritize rigorous, real-world validation of patient outcomes and safety protocols over perceived convenience. Your organization should establish clear liability frameworks and robust monitoring, including human oversight, to mitigate risks associated with unproven accuracy and potential misinformation, especially given the US healthcare system's existing challenges. Do not assume benchmark scores translate directly to safe, effective patient care.
Key insights
Health systems are deploying branded AI chatbots despite unproven patient outcomes and significant accuracy concerns.
Principles
- AI chatbot accuracy varies significantly with user prompting skill.
- Real-world AI performance often diverges from benchmark scores.
Method
Hartford HealthCare's PatientGPT uses an iterative stress testing (red teaming) approach to reduce failure rates in high-risk scenarios, dropping from 30% to 8.5%.
In practice
- Integrate AI chatbots into existing patient portals like MyChart.
- Implement human review for a subset of AI interactions.
- Use AI for visit agendas and understanding test results.
Topics
- AI Chatbots
- Healthcare Access
- PatientGPT
- Emmie
- Medical Misinformation
Best for: CTO, VP of Engineering/Data, AI Product Manager, Director of AI/ML, Executive, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI - Ars Technica.