Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mental Health & Psychological Support · Depth: Expert, extended

Summary

Research by Yunkai Xu and Saeed Abdullah investigates the efficacy of creating multilingual mental health dialogue datasets using persona-based localization. They modified nationality and language parameters in clinically validated English personas to generate dialogues in Mandarin, Bengali, and Hindi. These dialogues were then evaluated for depression severity by various LLM judges, including GPT-4o-mini, DeepSeek-V3.2, LLaMA3.1-8B, Qwen3-8B, and DeepSeek-R1-8B. Findings indicate that merely adding nationality and language parameters introduces clinical inconsistency across languages. LLM judges often showed inaccuracies in non-English texts, with performance varying significantly, especially for smaller models like DeepSeek-R1-8B and Llama3-8B, which exhibited substantial accuracy drops and higher cross-severity errors in non-English contexts. This highlights systemic limitations of applying English-centric personas to multilingual settings, underscoring the need for culturally responsive data generation.

Key takeaway

For AI Scientists and NLP Engineers developing LLM-based mental health support systems for global populations, relying on simple persona localization by modifying nationality and language in English-centric templates is insufficient. You must treat multilingual persona construction as a distinct design and validation process, incorporating culturally grounded expression and rigorous output-level evaluation. This approach ensures clinical consistency and mitigates systemic biases, leading to more equitable and effective digital mental health solutions.

Key insights

Simple nationality and language parameter changes in English-centric personas fail to preserve clinical consistency in multilingual mental health dialogues.

Principles

Method

An LLM-based therapist agent generated dialogues from personas with modified nationality/language. Independent LLM judges then performed blind pairwise severity comparisons using overall accuracy, same-level error rate, and tie distance metrics.

In practice

Topics

Code references

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.