COMPASS: COntinual Multilingual PEFT with Adaptive Semantic Sampling
Summary
COMPASS (COntinual Multilingual PEFT with Adaptive Semantic Sampling) is a new data-centric framework designed to adapt large language models (LLMs) to target languages, mitigating performance disparities and negative cross-lingual interference. It employs parameter-efficient fine-tuning (PEFT) by training lightweight, language-specific adapters on a carefully chosen subset of auxiliary multilingual data. The framework's core is a distribution-aware sampling strategy that utilizes multilingual embeddings and clustering to identify semantic gaps between current training data and the target usage distribution. By prioritizing auxiliary data from under-represented semantic clusters, COMPASS enhances positive cross-lingual transfer and minimizes interference. An extension, COMPASS-ECDA, provides a continual learning framework that monitors for data distribution shifts in production, dynamically updating adapters to prevent model staleness while preserving existing knowledge. This approach consistently outperforms baseline methods across Phi-4-Mini, Llama-3.1-8B, and Qwen2.5-7B architectures on benchmarks like Global-MMLU, MMLU-ProX, and OneRuler.
Key takeaway
For AI Engineers developing or maintaining multilingual LLMs, COMPASS offers a robust framework to improve performance and prevent model staleness. By adopting its distribution-aware semantic sampling and continual learning approach, you can effectively adapt models to new languages and dynamic data shifts, ensuring high performance across diverse linguistic contexts. Consider integrating COMPASS-ECDA for production environments requiring ongoing adaptation.
Key insights
COMPASS adapts LLMs to new languages using PEFT and semantic sampling to minimize cross-lingual interference.
Principles
- Prioritize data from under-represented semantic clusters.
- Balance adaptation to new data with preservation of existing knowledge.
Method
COMPASS uses multilingual embeddings and clustering to identify semantic gaps, then samples auxiliary data from under-represented clusters for PEFT adapter training.
In practice
- Train language-specific adapters for multilingual LLMs.
- Implement distribution-aware sampling for data selection.
Topics
- Continual Multilingual PEFT
- Adaptive Semantic Sampling
- Cross-lingual Interference
- Large Language Models
- Data Distribution Shifts
Best for: AI Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.