Why LLMs Are Worse at Math in Other Languages — And It’s Not Just the Data
Summary
A recent paper investigates why Large Language Models (LLMs) exhibit lower math accuracy in non-English languages, particularly low-resource ones, even when controlling for training data. Employing a mechanistic interpretability approach, similar to MathNeurosurgery, the research uses statistical signals from the forward pass to identify "math parameters" within models like Llama and Qwen. Findings indicate these math parameters partially overlap across languages, with the highest concentration of shared cross-lingual arithmetic modules in the model's middle layers. English consistently utilizes the largest set of math-related parameters, while low-resource languages allocate noticeably fewer. Weight intervention experiments confirmed these parameters are critical for math ability, suggesting performance gaps stem from parameter allocation, not solely data volume.
Key takeaway
For Machine Learning Engineers optimizing multilingual LLMs, understand that poor math performance in low-resource languages is not solely a data problem. Your models may not be allocating sufficient "math parameters" for these languages. Consider exploring methods beyond simply adding more data, such as targeted architectural interventions or parameter allocation strategies, to improve cross-lingual arithmetic capabilities, especially in the model's middle layers.
Key insights
LLM math performance differences across languages are linked to varying allocations and overlaps of "math parameters" within the model's architecture.
Principles
- LLM math parameters partially overlap across languages.
- Cross-lingual arithmetic modules concentrate in middle layers.
- English utilizes more math-related parameters than low-resource languages.
Method
Identify critical "math parameters" using statistical signals from the forward pass, then extract and compare them across languages in open-source models like Llama and Qwen.
In practice
- Investigate parameter allocation strategies for low-resource languages.
- Focus multilingual math improvements on middle-layer architectures.
Topics
- Large Language Models
- Multilingual AI
- Mechanistic Interpretability
- Math Reasoning
- Parameter Allocation
- Low-Resource Languages
Best for: Research Scientist, AI Engineer, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by LLM on Medium.