A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoE
Summary
A new method called PARAM$Δ$ Integration into Upcycled MoE addresses the high cost and data demands of expanding Large Language Models (LLMs) to new languages. This approach upcycles a dense LLM into a Mixture-of-Experts (MoE) architecture, assigning specific experts to different languages. It transfers alignment capabilities by grafting a MoE-expanded parameter delta ($Δ_{\text{post}}$) onto a Continued Pre-Training (CPT)-enhanced base model, thereby avoiding the need for complex and data-intensive alignment. This technique resolves the trade-off in data-free merging methods, which often dilute new language acquisition when preserving original abilities. Experiments confirm PARAM$Δ$ Integration's superior performance on expanded languages while maintaining original capabilities, even against baselines with comparable FLOPs or parameter counts.
Key takeaway
For research scientists developing multilingual LLMs, PARAM$Δ$ Integration offers a data-efficient path to language expansion. You should consider this MoE-based approach to bypass costly alignment phases and mitigate the trade-off between new language acquisition and original capability preservation, potentially reducing computational resources and development time.
Key insights
Upcycling dense LLMs into MoE architectures with parameter delta integration efficiently expands language capabilities.
Principles
- Allocate experts to specific languages.
- Transfer alignment via parameter delta grafting.
Method
Upcycle a dense model into a Mixture-of-Experts (MoE) architecture, allocating experts to languages. Graft a MoE-expanded parameter delta ($Δ_{\text{post}}$) to a CPT-enhanced base model to transfer alignment.
In practice
- Apply to various LLM architectures.
- Integrate different Post-training deltas.
Topics
- Multilingual LLMs
- Mixture-of-Experts
- Parameter Delta Integration
- Data-Efficient Language Expansion
- Model Upcycling
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.