The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics · Depth: Expert, extended

Summary

The study "The Masked Advantage: Uncovering Local-Language Access to Cultural Knowledge in LLMs" by Zhang et al. introduces a controlled framework to differentiate language proficiency from localized cultural knowledge access in large language models. Evaluating approximately 80 models across 13 locales, the research uses a 1PL item response theory model to analyze performance on culture-agnostic and culture-specific questions in both English and local languages. Findings reveal a consistent English proficiency advantage (mean GlobalGap of -0.79), but after accounting for this, local languages show a positive knowledge-access advantage in 98% of locale–model settings. This local-language knowledge advantage is often obscured by weaker local-language proficiency, becoming visible with frontier, regionally aligned, or language-adapted models.

Key takeaway

For AI Scientists and ML Engineers evaluating multilingual LLMs, recognize that lower local-language performance often stems from proficiency gaps, not a lack of cultural knowledge. Your models likely possess hidden local-language knowledge advantages, which can be revealed by targeted language adaptation or stronger multilingual training. Focus on improving linguistic alignment to surface this existing cultural understanding, rather than assuming knowledge is absent.

Key insights

LLMs often access local cultural knowledge better in local languages, despite lower raw accuracy due to proficiency gaps.

Principles

Method

A controlled framework uses a shared 1PL Item Response Theory model to estimate GlobalGap (proficiency), LocalGap (combined effect on culture-specific questions), and KnowledgeGap (isolating knowledge access) across question types and query languages.

In practice

Topics

Best for: Research Scientist, AI Engineer, AI Product Manager, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.