Location Not Found: Exposing Implicit Local and Global Biases in Multilingual LLMs

2026-04-22 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

A new diagnostic benchmark, LocQA, has been introduced to quantify implicit local and global biases in multilingual large language models (LLMs). LocQA comprises 2,156 locale-ambiguous questions in 12 languages, covering 49 regions, related to facts like laws, dates, and measurements, without explicit locale indicators. Evaluating 32 LLMs, researchers identified two structural biases: a "Global Bias" showing a persistent US-centric default, even when queried in non-English languages, and a "Regional Bias" where models prioritize locales with larger populations within a shared language. Instruction tuning exacerbates the US-centric bias (a "Cultural Alignment Tax") while reducing regional distortion by increasing answer multiplicity, often anchoring additional context to US norms. The average Global Bias across models was 0.24, indicating US answers appeared in 50% of model outputs compared to 26% in the data.

Key takeaway

For research scientists developing or deploying multilingual LLMs, you should explicitly evaluate models for implicit biases using benchmarks like LocQA. Be aware that instruction tuning, while improving general helpfulness, can inadvertently increase US-centric bias, potentially leading to cultural erasure for non-US locales. Prioritize multicultural and multi-regional modeling approaches to ensure factual adequacy and equitable representation for global audiences, rather than relying solely on linguistic fluency.

Key insights

Multilingual LLMs exhibit implicit US-centric and population-driven biases, exacerbated by instruction tuning.

Principles

Linguistic fluency does not guarantee cultural grounding.
Instruction tuning can increase US-centric bias.
Model representation scales logarithmically with population size.

Method

LocQA is a diagnostic benchmark with 2,156 locale-ambiguous questions across 12 languages and 49 regions. It uses an LLM-as-a-Judge pipeline to evaluate implicit biases by analyzing model responses to questions lacking explicit locale context.

In practice

Use LocQA to evaluate LLM localization capabilities.
Scrutinize instruction-tuned models for increased US bias.
Implement multicultural training to mitigate regional erasure.

Topics

Multilingual LLMs
Implicit Bias
LocQA Benchmark
Global Bias
Regional Bias

Code references

google-research-datasets/locqa

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.