A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$Δ$ Integration into Upcycled MoE

2026-05-18 · Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Expert, quick

Summary

A new method called PARAM$Δ$ Integration into Upcycled MoE addresses the high cost and data demands of expanding Large Language Models (LLMs) to new languages. This approach upcycles a dense LLM into a Mixture-of-Experts (MoE) architecture, assigning specific experts to different languages. It transfers alignment capabilities by grafting a MoE-expanded parameter delta ($Δ_{\text{post}}$) onto a Continued Pre-Training (CPT)-enhanced base model, thereby avoiding the need for complex and data-intensive alignment. This technique resolves the trade-off in data-free merging methods, which often dilute new language acquisition when preserving original abilities. Experiments confirm PARAM$Δ$ Integration's superior performance on expanded languages while maintaining original capabilities, even against baselines with comparable FLOPs or parameter counts.

Key takeaway

For research scientists developing multilingual LLMs, PARAM$Δ$ Integration offers a data-efficient path to language expansion. You should consider this MoE-based approach to bypass costly alignment phases and mitigate the trade-off between new language acquisition and original capability preservation, potentially reducing computational resources and development time.

Key insights

Upcycling dense LLMs into MoE architectures with parameter delta integration efficiently expands language capabilities.

Principles

Allocate experts to specific languages.
Transfer alignment via parameter delta grafting.

Method

Upcycle a dense model into a Mixture-of-Experts (MoE) architecture, allocating experts to languages. Graft a MoE-expanded parameter delta ($Δ_{\text{post}}$) to a CPT-enhanced base model to transfer alignment.

In practice

Apply to various LLM architectures.
Integrate different Post-training deltas.

Topics

Multilingual LLMs
Mixture-of-Experts
Parameter Delta Integration
Data-Efficient Language Expansion
Model Upcycling

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.