Why LLMs Are Worse at Math in Other Languages — And It’s Not Just the Data

2026-06-23 · Source: LLM on Medium · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, quick

Summary

A recent paper investigates why Large Language Models (LLMs) exhibit lower math accuracy in non-English languages, particularly low-resource ones, even when controlling for training data. Employing a mechanistic interpretability approach, similar to MathNeurosurgery, the research uses statistical signals from the forward pass to identify "math parameters" within models like Llama and Qwen. Findings indicate these math parameters partially overlap across languages, with the highest concentration of shared cross-lingual arithmetic modules in the model's middle layers. English consistently utilizes the largest set of math-related parameters, while low-resource languages allocate noticeably fewer. Weight intervention experiments confirmed these parameters are critical for math ability, suggesting performance gaps stem from parameter allocation, not solely data volume.

Key takeaway

For Machine Learning Engineers optimizing multilingual LLMs, understand that poor math performance in low-resource languages is not solely a data problem. Your models may not be allocating sufficient "math parameters" for these languages. Consider exploring methods beyond simply adding more data, such as targeted architectural interventions or parameter allocation strategies, to improve cross-lingual arithmetic capabilities, especially in the model's middle layers.

Key insights

LLM math performance differences across languages are linked to varying allocations and overlaps of "math parameters" within the model's architecture.

Principles

LLM math parameters partially overlap across languages.
Cross-lingual arithmetic modules concentrate in middle layers.
English utilizes more math-related parameters than low-resource languages.

Method

Identify critical "math parameters" using statistical signals from the forward pass, then extract and compare them across languages in open-source models like Llama and Qwen.

In practice

Investigate parameter allocation strategies for low-resource languages.
Focus multilingual math improvements on middle-layer architectures.

Topics

Large Language Models
Multilingual AI
Mechanistic Interpretability
Math Reasoning
Parameter Allocation
Low-Resource Languages

Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.