Enhancing Multilingual Reasoning via Steerable Model Merging

· Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

The Steerable Model Merging (ST-Merge) framework improves multilingual reasoning in Large Language Models by dynamically adjusting the contributions of a multilingual encoder and a reasoning LLM. Traditional "one-size-fits-all" merging methods often create conflicts, leading to suboptimal performance, particularly degrading reasoning in languages where the LLM is already proficient. ST-Merge addresses this with a gated cross-attention mechanism that adaptively weights or filters the source models based on input characteristics, incorporating a learnable language embedding. Evaluated on four multilingual reasoning benchmarks across 21 languages, ST-Merge consistently surpasses strong baselines like MindMerger, achieving average gains of +1.7%, +1.3%, +1.5%, and +1.5% across the benchmarks. It maintains high accuracy in English (68.0% on MGSM) while significantly enhancing performance in low-resource languages. The framework utilizes an mT5-xl encoder and MetaMath (fine-tuned LLaMA2-7B) as the reasoning LLM.

Key takeaway

For NLP Engineers building multilingual LLMs, if you face performance trade-offs between high-resource and low-resource languages, adopt dynamic model merging. ST-Merge's adaptive weighting of multilingual encoders and reasoning LLMs prevents degradation in proficient languages while enhancing low-resource understanding. This ensures your models achieve robust, generalized reasoning across diverse linguistic contexts.

Key insights

Dynamically modulating source model contributions resolves conflicts in multilingual LLM merging for improved reasoning.

Principles

Method

ST-Merge employs a gated cross-attention network and language embeddings to dynamically weight multilingual encoder and reasoning LLM features, fusing them for the LLM decoder.

In practice

Topics

Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.