Creating Multilingual Mental Health Dialogue Datasets: Limits of Persona-Based Localization via Nationality and Language

2026-06-19 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Mental Health & Psychological Support · Depth: Expert, extended

Summary

Research by Yunkai Xu and Saeed Abdullah investigates the efficacy of creating multilingual mental health dialogue datasets using persona-based localization. They modified nationality and language parameters in clinically validated English personas to generate dialogues in Mandarin, Bengali, and Hindi. These dialogues were then evaluated for depression severity by various LLM judges, including GPT-4o-mini, DeepSeek-V3.2, LLaMA3.1-8B, Qwen3-8B, and DeepSeek-R1-8B. Findings indicate that merely adding nationality and language parameters introduces clinical inconsistency across languages. LLM judges often showed inaccuracies in non-English texts, with performance varying significantly, especially for smaller models like DeepSeek-R1-8B and Llama3-8B, which exhibited substantial accuracy drops and higher cross-severity errors in non-English contexts. This highlights systemic limitations of applying English-centric personas to multilingual settings, underscoring the need for culturally responsive data generation.

Key takeaway

For AI Scientists and NLP Engineers developing LLM-based mental health support systems for global populations, relying on simple persona localization by modifying nationality and language in English-centric templates is insufficient. You must treat multilingual persona construction as a distinct design and validation process, incorporating culturally grounded expression and rigorous output-level evaluation. This approach ensures clinical consistency and mitigates systemic biases, leading to more equitable and effective digital mental health solutions.

Key insights

Simple nationality and language parameter changes in English-centric personas fail to preserve clinical consistency in multilingual mental health dialogues.

Principles

Minimal persona localization introduces clinical inconsistency across languages.
LLM judge performance varies significantly across languages and models for mental health assessment.
Multilingual clinical personas require output-level validation, not just template extension.

Method

An LLM-based therapist agent generated dialogues from personas with modified nationality/language. Independent LLM judges then performed blind pairwise severity comparisons using overall accuracy, same-level error rate, and tie distance metrics.

In practice

Rigorously validate multilingual synthetic data outputs for clinical consistency.
Avoid direct translation or minimal parameter changes for culturally sensitive data.
Employ multiple LLM judges and human review for robust cross-lingual evaluation.

Topics

Multilingual LLMs
Mental Health Datasets
Synthetic Data Generation
Persona-based AI
Cross-cultural Bias
Depression Severity Assessment

Code references

Xuyk021/CLPsych2026workshop

Best for: AI Scientist, NLP Engineer, Research Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.