Implicit Geographic Inference in LLM Medical Triage: Language-Driven Disparities in Emergency Recommendations

2026-05-31 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A study using Gemini 3.5 Flash investigated implicit geographic inference in LLM medical triage, revealing language-driven disparities in emergency recommendations. Researchers evaluated a neurological symptom profile (persistent headache, blurred vision, nausea) across six languages: English, Spanish, Chinese, Hindi, Japanese, and Arabic, with 30 runs per condition, totaling 450 API calls. The model recommended emergency room visits at rates from 0% (Japanese, Hindi) to 30% (English, Arabic), despite assigning nearly identical severity scores (7.7-8.0/10) across all languages. Crucially, adding a single sentence specifying a US location increased ER recommendations by up to 76.7 percentage points for non-English prompts. Conversely, an English prompt with a Tokyo location reduced the ER rate from 30% to 6.7%. Back-translation confirmed the disparity stems from implicit geographic inference, not translation quality.

Key takeaway

For machine learning engineers deploying LLMs in global healthcare, you must rigorously test model outputs across all target languages and geographic contexts. Your models may exhibit implicit biases, leading to disparate recommendations based solely on input language. Explicitly specifying location in prompts can mitigate these biases. Prioritize comprehensive linguistic and cultural validation to ensure equitable and safe AI system performance.

Key insights

LLMs exhibit implicit geographic bias in medical triage, leading to disparate emergency recommendations based solely on input language.

Principles

LLM outputs can vary significantly by input language.
Implicit geographic inference influences medical recommendations.
Language choice can introduce bias in AI systems.

Method

Evaluated Gemini 3.5 Flash with a neurological symptom profile across six languages. Measured ER recommendation rates and severity scores, then tested location anchors and back-translation.

In practice

Test LLM outputs across diverse languages.
Explicitly anchor geographic context in prompts.
Use back-translation to diagnose language bias.

Topics

LLM Bias
Medical Triage
Geographic Inference
Language Disparity
Gemini 3.5 Flash
AI Ethics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.