Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs
Summary
A new test set, LocQA, has been developed to quantify implicit inter- and intra-lingual biases in multilingual large language models (LLMs). LocQA comprises 2,156 locale-ambiguous questions across 12 languages, covering facts like laws, dates, and measurements, without explicit locale indicators. By evaluating 32 LLMs with LocQA, researchers identified two structural biases. Inter-lingually, models exhibit a global bias towards US-locale answers, even when queried in non-English languages, a bias exacerbated in instruction-tuned models compared to their base versions. Intra-lingually, models prioritize locales with larger populations when multiple locales are relevant for a single language, effectively acting as demographic probability engines. These findings offer insights for shaping LLM local behavior and assessing training phase impacts on bias.
Key takeaway
For research scientists and engineers developing multilingual LLMs, understanding these implicit biases is crucial. Your models likely default to US-centric answers and prioritize locales by population size, especially if instruction-tuned. You should integrate bias detection tools like LocQA into your evaluation pipelines to identify and mitigate these structural biases, ensuring more culturally neutral and accurate responses across diverse linguistic contexts.
Key insights
Multilingual LLMs exhibit implicit US-centric and demographic biases when answering locale-ambiguous questions.
Principles
- Instruction tuning exacerbates global biases.
- LLMs act as demographic probability engines.
Method
LocQA is a test set of 2,156 locale-ambiguous questions in 12 languages, designed to expose implicit LLM biases related to laws, dates, and measurements by observing responses to queries lacking explicit locale indicators.
In practice
- Evaluate LLMs for US-locale bias.
- Assess instruction tuning's bias impact.
- Consider demographic priors in LLM outputs.
Topics
- Multilingual LLMs
- Implicit Bias
- LocQA Benchmark
- Inter-lingual Bias
- Intra-lingual Bias
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, NLP Engineer, AI Ethicist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.