Enhancing Multilingual Reasoning via Steerable Model Merging

2026-06-18 · Source: cs.CL updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, extended

Summary

The Steerable Model Merging (ST-Merge) framework improves multilingual reasoning in Large Language Models by dynamically adjusting the contributions of a multilingual encoder and a reasoning LLM. Traditional "one-size-fits-all" merging methods often create conflicts, leading to suboptimal performance, particularly degrading reasoning in languages where the LLM is already proficient. ST-Merge addresses this with a gated cross-attention mechanism that adaptively weights or filters the source models based on input characteristics, incorporating a learnable language embedding. Evaluated on four multilingual reasoning benchmarks across 21 languages, ST-Merge consistently surpasses strong baselines like MindMerger, achieving average gains of +1.7%, +1.3%, +1.5%, and +1.5% across the benchmarks. It maintains high accuracy in English (68.0% on MGSM) while significantly enhancing performance in low-resource languages. The framework utilizes an mT5-xl encoder and MetaMath (fine-tuned LLaMA2-7B) as the reasoning LLM.

Key takeaway

For NLP Engineers building multilingual LLMs, if you face performance trade-offs between high-resource and low-resource languages, adopt dynamic model merging. ST-Merge's adaptive weighting of multilingual encoders and reasoning LLMs prevents degradation in proficient languages while enhancing low-resource understanding. This ensures your models achieve robust, generalized reasoning across diverse linguistic contexts.

Key insights

Dynamically modulating source model contributions resolves conflicts in multilingual LLM merging for improved reasoning.

Principles

Fixed model merging causes conflicts.
Input-aware modulation is crucial.
Language identity guides adaptation.

Method

ST-Merge employs a gated cross-attention network and language embeddings to dynamically weight multilingual encoder and reasoning LLM features, fusing them for the LLM decoder.

In practice

Implement gated cross-attention.
Integrate language embeddings.
Prioritize multilingual for low-resource.

Topics

Multilingual LLMs
Model Merging
Gated Cross-Attention
Low-Resource Languages
Cross-Lingual Transfer
MetaMath

Best for: AI Engineer, Research Scientist, AI Scientist, NLP Engineer, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CL updates on arXiv.org.