Mitigating Catastrophic Forgetting in Target Language Adaptation of LLMs via Source-Shielded Updates

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, long

Summary

Researchers from the University of Sheffield, Hitachi, Ltd., and the University of Exeter introduced Source-Shielded Updates (SSU), a novel selective parameter update strategy designed to mitigate catastrophic forgetting when adapting instruct Large Language Models (LLMs) to new target languages using only unlabeled data. This method proactively preserves source knowledge by identifying and freezing parameters critical to the LLM's original capabilities before adaptation. Experiments conducted on 7B and 13B OLMo 2 Instruct models across five typologically diverse languages demonstrated that SSU reduced performance degradation on monolingual source tasks to an average of 3.4% for 7B models and 2.8% for 13B models, significantly outperforming full fine-tuning, which resulted in 20.3% and 22.3% degradation, respectively. SSU also achieved target-language performance competitive with, and often superior to, full fine-tuning across various benchmarks.

Key takeaway

For AI Engineers and Research Scientists working on multilingual LLM deployment, SSU offers a robust solution to expand linguistic diversity without sacrificing core model capabilities. By proactively shielding source knowledge, you can achieve strong target language performance while minimizing catastrophic forgetting, which is crucial for maintaining the general-purpose functionality of instruct models. Consider integrating SSU into your adaptation pipeline, especially when specialized instruction-tuning data for target languages is scarce or costly.

Key insights

Source-Shielded Updates (SSU) proactively freezes critical parameters to prevent catastrophic forgetting during LLM language adaptation.

Principles

Method

SSU involves three stages: scoring parameter importance using source data (e.g., Wanda), generating a column-wise freezing mask, and applying this mask during continual pre-training on unlabeled target language data.

In practice

Topics

Code references

Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.