LLM Parameters for Math Across Languages: Shared or Separate?
Summary
This study investigates whether mathematical reasoning parameters in multilingual Large Language Models (LLMs) are shared across languages or language-specific. Researchers analyzed Llama 1B, Qwen3 4B, and Llama 8B models across English, German, French, and Hindi using the MathNeurosurgery framework and Jaccard similarity. They found that math-associated parameters exhibit partial cross-lingual overlap, predominantly in intermediate model layers. English consistently showed the largest set of math-relevant parameters, correlating with its stronger reasoning performance, while lower-resource languages had smaller sets. The findings suggest that math capabilities are neither fully language-invariant nor entirely language-specific, but rather a blend with systematic language-dependent differences. Intervention experiments confirmed these parameters' collective influence, with scaling primarily correcting arithmetic errors.
Key takeaway
For AI Scientists and Machine Learning Engineers developing multilingual LLMs, understanding the language-dependent nature of mathematical reasoning parameters is crucial. You should consider that English-centric models may not generalize efficiently to other languages, especially those with different scripts like Hindi. Focus on optimizing intermediate model layers for cross-lingual math capabilities and explore targeted parameter interventions, such as pruning, to refine output formatting and in-context learning for specific language tasks.
Key insights
Multilingual LLMs exhibit partial, layer-dependent parameter overlap for math, with English dominating.
Principles
- Math-specific parameters show partial cross-lingual overlap.
- Overlap is strongest in intermediate LLM layers.
- English-centric pathways often dominate multilingual reasoning.
Method
Identify math-specific parameters using the MathNeurosurgery framework, comparing weight-activation products on math vs. non-math datasets. Measure cross-lingual overlap with the Jaccard coefficient.
In practice
- Focus optimization efforts on intermediate layers for multilingual math.
- Consider language-specific parameter tuning for lower-resource languages.
- Investigate pruning for output formatting improvements in specific tasks.
Topics
- Multilingual LLMs
- Mathematical Reasoning
- Parameter Localization
- Cross-lingual Transfer
- Model Interpretability
- Weight Pruning
- Jaccard Similarity
Code references
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.