Merging Methods for Multilingual Knowledge Editing for Large Language Models: An Empirical Odyssey

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

A study investigated vector merging methods for multilingual knowledge editing (MKE) in large language models (LLMs), addressing challenges where language-specific edits interfere. Researchers evaluated six merging variants using Llama3.1-8B-Instruct and Qwen2.5-7B-Instruct backbones, two base knowledge editing methods (MEMIT and AlphaEdit), and 12 languages on the MzsRE benchmark in a large-scale batch-editing setting (batch size = 700 x 12). The findings indicate that vector summation with shared covariance is the most reliable strategy, while simple summation without shared covariance performs poorly. Task Singular Vectors for Merging (TSVM) showed limited ability to mitigate multilingual interference, improving performance only in specific scenarios. The study also revealed that performance is highly sensitive to the weight scaling factor and rank compression ratio, with optimal results often achieved at slightly larger-than-default scaling and relatively low rank.

Key takeaway

For research scientists developing multilingual LLM editing solutions, prioritize methods that explicitly model cross-lingual compatibility rather than relying solely on post hoc merging. Your approach should incorporate shared covariance in vector merging and empirically tune the weight scaling factor, as values slightly above 1.0 often yield better performance. Additionally, explore the impact of rank compression ratios, as lower ranks can be beneficial for TSVM-based methods.

Key insights

Shared covariance in vector merging is crucial for effective multilingual knowledge editing in LLMs.

Principles

Method

The study systematically evaluated six vector merging functions, including Sum, Mean, and TSVM, with and without shared covariance, applied to editing vectors derived from locate-then-edit KE methods.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.