Detecting HIV-Related Stigma in Clinical Narratives Using Large Language Models
Summary
A study developed a large language model (LLM)-based tool to identify HIV-related stigma in clinical narratives from people living with HIV (PLWH) at the University of Florida Health between 2012 and 2022. Researchers identified candidate sentences using expert-curated keywords and clinical word embeddings, then manually annotated 1,332 sentences across four stigma subscales: Concern with Public Attitudes, Disclosure Concerns, Negative Self-Image, and Personalized Stigma. The study compared encoder-based models like GatorTron-large and BERT with generative LLMs including GPT-OSS-20B, LLaMA-8B, and MedGemma-27B. GatorTron-large achieved the highest overall performance with a Micro F1 score of 0.62. Few-shot prompting significantly improved generative model performance, with 5-shot GPT-OSS-20B and LLaMA-8B reaching Micro-F1 scores of 0.57 and 0.59, respectively. Negative Self-Image was the most predictable subscale, while Personalized Stigma proved the most challenging.
Key takeaway
For NLP engineers developing tools for sensitive clinical data, this research indicates that fine-tuned encoder models like GatorTron-large offer superior performance for specific stigma detection tasks compared to generative LLMs in zero-shot contexts. Consider using few-shot prompting to improve generative model accuracy if you opt for those architectures, but be aware of varying predictability across different stigma categories.
Key insights
LLMs can effectively detect HIV-related stigma in clinical notes, with encoder models outperforming generative models in zero-shot settings.
Principles
- Few-shot prompting enhances generative LLM performance.
- Stigma subscales vary in detection difficulty.
Method
Candidate sentences were identified via expert keywords and word embeddings, then manually annotated. Models were evaluated using zero-shot and few-shot prompting on four stigma subscales.
In practice
- Use GatorTron-large for HIV stigma detection.
- Apply few-shot prompting for generative LLMs.
Topics
- HIV Stigma Detection
- Clinical Note Analysis
- Large Language Models
- Natural Language Processing
- GatorTron-large
Best for: AI Scientist, NLP Engineer, Research Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.