Rethinking Cross-lingual Gaps from a Statistical Viewpoint

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

A study by Google DeepMind and Google Research re-examines the "cross-lingual gap" in Large Language Models (LLMs), which describes the accuracy drop when knowledge is queried in a target language compared to its source. Challenging prior assumptions of knowledge barriers or representation misalignment, the research hypothesizes that increased response variance in the target language is the dominant cause. This phenomenon is formalized using bias-variance decomposition. Extensive experiments across five LLMs, including Gemini-2.5-Flash, Gemini-2.5-Pro, GPT-5, GPT-5-mini, and Deepseek-R1, on ECLeKTic and MMLU (with mixup) benchmarks, provide evidence for this hypothesis. The study demonstrates that inference-time interventions, such as response ensembling (sampling ten responses per example) and input ensembling (e.g., Translate-then-Answer), effectively reduce this gap. A simple prompt instruction was shown to improve target accuracy by 20-25% across different models, further indicating that reducing response variance is key to mitigating cross-lingual performance disparities.

Key takeaway

For Machine Learning Engineers deploying multilingual LLMs, if you observe performance disparities across languages, your focus should shift from complex pretraining adjustments to simpler inference-time interventions. Implement response ensembling by sampling multiple outputs or use prompt instructions like "Translate-then-Answer" to reduce response variance. This approach can significantly improve target language accuracy by 20-25% without requiring extensive model retraining, optimizing resource allocation for multilingual deployments.

Key insights

LLM cross-lingual gaps stem from response variance, not knowledge barriers, and are reducible via variance control.

Principles

Cross-lingual gaps are variance-driven, not knowledge-driven.
Source and target response variance are proportional.
High source confidence reduces cross-lingual gaps.

Method

Formalize cross-lingual gaps via bias-variance decomposition, then apply inference-time interventions like response or input ensembling to reduce target response variance.

In practice

Sample multiple responses and ensemble them.
Use prompt instructions for implicit ensembling.
Focus on improving source language confidence.

Topics

Cross-lingual Gaps
Large Language Models
Response Variance
Bias-Variance Decomposition
Inference-Time Interventions
Prompt Engineering

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.