Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey
Summary
A study investigated vector merging methods for multilingual knowledge editing (MKE) in large language models (LLMs), addressing challenges where language-specific edits interfere. Researchers evaluated six merging variants using Llama3.1-8B-Instruct and Qwen2.5-7B-Instruct backbones, two base knowledge editing methods (MEMIT and AlphaEdit), and 12 languages on the MzsRE benchmark in a large-scale batch-editing setting (batch size = 700 x 12). The findings indicate that vector summation with shared covariance is the most reliable strategy, while simple summation without shared covariance performs poorly. Task Singular Vectors for Merging (TSVM) showed limited ability to mitigate multilingual interference, improving performance only in specific scenarios. The study also revealed that performance is highly sensitive to the weight scaling factor and rank compression ratio, with optimal results often achieved at slightly larger-than-default scaling and relatively low rank.
Key takeaway
For research scientists developing multilingual LLM editing solutions, prioritize methods that explicitly model cross-lingual compatibility rather than relying solely on post hoc merging. Your approach should incorporate shared covariance in vector merging and empirically tune the weight scaling factor, as values slightly above 1.0 often yield better performance. Additionally, explore the impact of rank compression ratios, as lower ranks can be beneficial for TSVM-based methods.
Key insights
Shared covariance in vector merging is crucial for effective multilingual knowledge editing in LLMs.
Principles
- Multilingual interference remains a significant bottleneck in MKE.
- Optimal weight scaling can exceed the default 1.0 value.
- Low-rank representations are relevant for multilingual editing vectors.
Method
The study systematically evaluated six vector merging functions, including Sum, Mean, and TSVM, with and without shared covariance, applied to editing vectors derived from locate-then-edit KE methods.
In practice
- Prioritize shared covariance in MKE vector merging.
- Experiment with weight scaling factors >1.0 for performance gains.
- Consider lower rank compression ratios for TSVM-based methods.
Topics
- Multilingual Knowledge Editing
- Large Language Models
- Vector Merging Methods
- Task Singular Vectors for Merging
- MEMIT and AlphaEdit
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.