Artificial Intolerance: Stigmatizing Language in Clinical Documentation Skews Large Language Model Decision-Making

· Source: cs.CL updates on arXiv.org · Field: Science & Research — Health & Medical Research, Mathematics & Computational Sciences, Research Methodology & Innovation · Depth: Expert, extended

Summary

A study evaluated nine frontier large language models (LLMs) for their susceptibility to stigmatizing language (SL) in clinical notes, focusing on sickle cell disease, obesity, cirrhosis, and fibromyalgia. Researchers designed clinical vignettes with neutral baselines and stigmatized versions containing varying intensities of three SL phenotypes: doubt, blame, and maligning. All nine LLMs exhibited substantial bias, with clinical decision-making significantly skewed towards less aggressive patient management, even with a single SL sentence. A dose-response relationship was observed as SL frequency increased. Furthermore, exposure to SL consistently degraded simulated clinician attitudes across all models and scenarios. Prompt-based mitigation strategies, including Chain-of-Thought (CoT) reasoning and model self-debiasing, showed limited efficacy, suggesting LLMs struggle to explicitly identify SL while remaining implicitly influenced by it. The study concludes that LLMs inherit and exacerbate human cognitive biases from SL, posing a risk to health equity by potentially automating and scaling disparities in patient care.

Key takeaway

For CTOs and VPs of Engineering integrating LLMs into clinical workflows, recognize that current frontier models are highly susceptible to subtle stigmatizing language in medical documentation, leading to biased clinical decisions and degraded simulated attitudes. Your teams should prioritize developing and implementing foundational model-level alignment criteria and domain-specific fine-tuning to neutralize implicit linguistic biases, rather than relying on insufficient prompt-based mitigation strategies, to prevent automating and scaling healthcare disparities.

Key insights

LLMs inherit and propagate human cognitive biases from stigmatizing language in clinical notes, leading to less aggressive patient management.

Principles

Method

The study used controlled clinical vignettes with neutral and stigmatized versions across four conditions, evaluating nine LLMs on clinical decision-making and simulated attitudes, then testing Chain-of-Thought and self-debiasing mitigation strategies.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Scientist, Research Scientist, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.