Implicit Geographic Inference in LLM Medical Triage: Language-Driven Disparities in Emergency Recommendations
Summary
A study using Gemini 3.5 Flash investigated implicit geographic inference in LLM medical triage, revealing language-driven disparities in emergency recommendations. Researchers evaluated a neurological symptom profile (persistent headache, blurred vision, nausea) across six languages: English, Spanish, Chinese, Hindi, Japanese, and Arabic, with 30 runs per condition, totaling 450 API calls. The model recommended emergency room visits at rates from 0% (Japanese, Hindi) to 30% (English, Arabic), despite assigning nearly identical severity scores (7.7-8.0/10) across all languages. Crucially, adding a single sentence specifying a US location increased ER recommendations by up to 76.7 percentage points for non-English prompts. Conversely, an English prompt with a Tokyo location reduced the ER rate from 30% to 6.7%. Back-translation confirmed the disparity stems from implicit geographic inference, not translation quality.
Key takeaway
For machine learning engineers deploying LLMs in global healthcare, you must rigorously test model outputs across all target languages and geographic contexts. Your models may exhibit implicit biases, leading to disparate recommendations based solely on input language. Explicitly specifying location in prompts can mitigate these biases. Prioritize comprehensive linguistic and cultural validation to ensure equitable and safe AI system performance.
Key insights
LLMs exhibit implicit geographic bias in medical triage, leading to disparate emergency recommendations based solely on input language.
Principles
- LLM outputs can vary significantly by input language.
- Implicit geographic inference influences medical recommendations.
- Language choice can introduce bias in AI systems.
Method
Evaluated Gemini 3.5 Flash with a neurological symptom profile across six languages. Measured ER recommendation rates and severity scores, then tested location anchors and back-translation.
In practice
- Test LLM outputs across diverse languages.
- Explicitly anchor geographic context in prompts.
- Use back-translation to diagnose language bias.
Topics
- LLM Bias
- Medical Triage
- Geographic Inference
- Language Disparity
- Gemini 3.5 Flash
- AI Ethics
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.