A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoE

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A new method called PARAM$Δ$ Integration into Upcycled MoE addresses the high cost and data demands of expanding Large Language Models (LLMs) to new languages. This approach upcycles a dense LLM into a Mixture-of-Experts (MoE) architecture, assigning specific experts to different languages. It transfers alignment capabilities by grafting a MoE-expanded parameter delta ($Δ_{\text{post}}$) onto a Continued Pre-Training (CPT)-enhanced base model, thereby avoiding the need for complex and data-intensive alignment. This technique resolves the trade-off in data-free merging methods, which often dilute new language acquisition when preserving original abilities. Experiments confirm PARAM$Δ$ Integration's superior performance on expanded languages while maintaining original capabilities, even against baselines with comparable FLOPs or parameter counts.

Key takeaway

For research scientists developing multilingual LLMs, PARAM$Δ$ Integration offers a data-efficient path to language expansion. You should consider this MoE-based approach to bypass costly alignment phases and mitigate the trade-off between new language acquisition and original capability preservation, potentially reducing computational resources and development time.

Key insights

Upcycling dense LLMs into MoE architectures with parameter delta integration efficiently expands language capabilities.

Principles

Method

Upcycle a dense model into a Mixture-of-Experts (MoE) architecture, allocating experts to languages. Graft a MoE-expanded parameter delta ($Δ_{\text{post}}$) to a CPT-enhanced base model to transfer alignment.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.