Closed-Form Spectral Regularization for Multi-Task Model Merging

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, extended

Summary

This paper introduces SWUDI and SWUDI-A, novel closed-form spectral regularization methods for multi-task model merging. Traditional state-of-the-art techniques like WUDI and OptMerge combine independently fine-tuned expert models into a single multi-task model without training data, but rely on hundreds of gradient descent iterations, consuming significant resources (e.g., 85 minutes and 42 GB GPU memory for a 3B-parameter LLM). The authors reveal that this iterative process implicitly regularizes an ill-posed normal equation, where small-eigenvalue directions amplify proxy noise. Building on this, SWUDI combines a soft exponential filter with a hard top-K truncation, while SWUDI-A adaptively determines per-layer rank rules. Both methods require only a single symmetric eigendecomposition per linear layer, eliminating training data and optimizer states. Benchmarking across vision, language, and multimodal tasks, including a new MLLM benchmark, shows SWUDI and SWUDI-A match or exceed existing methods, reducing wall-clock time by 28–72x and peak GPU memory by up to 50%.

Key takeaway

For MLOps Engineers deploying multi-task models, SWUDI and SWUDI-A offer a critical efficiency upgrade. You can now achieve state-of-the-art model merging accuracy with 28–72x faster execution and up to 50% less GPU memory compared to iterative methods. Prioritize SWUDI-A for its adaptive, hyperparameter-free layer-wise rank selection, especially when dealing with diverse model architectures or low-rank LoRA deltas, to streamline your deployment pipeline and reduce operational costs significantly.

Key insights

Iterative model merging implicitly regularizes an ill-posed inverse problem, which can be replaced by explicit closed-form spectral filtering.

Principles

Small eigenvalues in merging operators amplify proxy noise.
Iterative descent acts as an early-stopping spectral filter.
Adaptive rank rules improve robustness across architectures.

Method

Formalize multi-task model merging as a noisy linear inverse problem. Propose a unified spectral filtering estimator, instantiated by SWUDI (soft exponential filter + hard truncation) and SWUDI-A (adaptive per-layer rank rules).

In practice

Use SWUDI for 28-72x faster model merging.
Apply SWUDI-A for hyperparameter-free adaptive merging.
Combine spectral merging with test-time adaptation for further gains.

Topics

Model Merging
Spectral Regularization
Multi-task Learning
Foundation Models
Large Language Models
Multimodal AI
Computational Efficiency

Code references

WalkerWorldPeace/MLLMerging

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.