English is Not All You Need: Systematically Exploring the Role of Multilinguality in LLM Post-Training
Summary
A systematic study investigated the impact of multilingual post-training on large language models, using 220 supervised fine-tuning runs on parallel translated data. The research explored mathematical reasoning and API calling tasks with models up to 8 billion parameters. Findings indicate that increasing language coverage during post-training generally improves performance across various tasks and model scales. Low-resource languages showed the most significant gains, while high-resource languages reached a plateau without degradation. The study also revealed that even minimal multilingual inclusion, such as a single non-English language, enhances both English performance and cross-lingual generalization, suggesting that English-only post-training is largely suboptimal. Furthermore, sufficient language diversity can enable zero-shot cross-lingual transfer to match or surpass direct language inclusion benefits, though improvements for typologically distant, low-resource languages remain limited.
Key takeaway
For AI Engineers and Research Scientists developing or fine-tuning large language models, you should prioritize multilingual post-training over English-only approaches. Incorporating even a single non-English language can improve both English performance and cross-lingual generalization, making your models more robust and globally applicable. Consider diversifying your training data to include low-resource languages, as they show the most significant performance gains, and explore how sufficient language diversity can enable effective zero-shot cross-lingual transfer.
Key insights
Multilingual post-training significantly improves LLM performance and cross-lingual generalization, even with minimal non-English data.
Principles
- Increased language coverage benefits LLMs across scales.
- Low-resource languages gain most from multilingual training.
- English-only post-training is suboptimal.
Method
The study involved 220 supervised fine-tuning runs on parallel translated multilingual data mixtures, covering mathematical reasoning and API calling tasks, using models up to 8B parameters.
In practice
- Incorporate non-English data in LLM post-training.
- Prioritize low-resource languages for greater impact.
- Leverage diverse language sets for zero-shot transfer.
Topics
- LLM Post-Training
- Multilingual LLMs
- Supervised Fine-Tuning
- Cross-Lingual Generalization
- Low-Resource Languages
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.