Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs

2026-05-05 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, quick

Summary

A paper submitted on May 2, 2026, titled "Lost in the Tower of Babel: The Adverse Effects of Incidental Multilingualism in LLMs," argues that current large language models (LLMs) exhibit fragile and misleading multilingual capabilities. The authors, Anjishnu Mukherjee, Chutong Meng, and Antonios Anastasopoulos, contend that LLMs appear multilingual due to training on vast, uneven web corpora rather than intentional design for multilingual competence. This incidental approach leads to unequal, brittle, and opaque behavior across languages, posing severe consequences for real-world deployments requiring cross-linguistic reasoning. The study empirically investigates which languages models claim to support versus those they actually respond in, demonstrating how simple language-change attacks expose these failures and hidden assumptions within LLM systems.

Key takeaway

For research scientists developing or deploying multilingual LLMs, you should critically evaluate models for "multilingualism by design" rather than relying on incidental capabilities. Prioritize equitable multilingual performance and cultural grounding as first-class goals in your model pipeline to avoid brittle cross-lingual behavior and hidden language assumptions in real-world applications.

Key insights

Incidental multilingualism in LLMs leads to brittle, unequal, and opaque cross-lingual performance.

Principles

Multilingual competence is not a core LLM design objective.
Uneven web corpora cause fragile multilingual behavior.

Method

The study empirically investigates LLM self-reported language support versus actual response languages, using language-change attacks to expose failures.

In practice

Test LLMs with language-change attacks.
Verify actual language response against claimed support.

Topics

Incidental Multilingualism
Large Language Models
Multilingual NLP
Cross-lingual Behavior
Language-Change Attacks

Best for: Research Scientist, AI Scientist, NLP Engineer, AI Ethicist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.