Essential Subspace Merging for Multi-Task Learning

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

Essential Subspace Merging (ESM) and its extension ESM++ are proposed methods for multi-task learning, addressing the challenge of inter-task interference in model merging. Model merging integrates capabilities from multiple models, fine-tuned from a shared pre-trained checkpoint, into a single unified model. The core observation is that output shifts from task updates concentrate their energy in a small number of principal directions, forming an "essential subspace," while other directions cause interference. Essential Subspace Decomposition (ESD) is introduced to decompose task updates based on activation shift principal components. ESM is a training-free static merging method that orthogonalizes and fuses these essential components. ESM++ extends this as a training-free dynamic merging method, decomposing task-specific residuals into low-rank experts and using prototype-based routing for inference. Experiments confirm ESM and ESM++ effectively preserve task knowledge and reduce interference across various task sets and model scales.

Key takeaway

For Machine Learning Engineers developing multi-task models, consider Essential Subspace Merging (ESM) or ESM++ to mitigate inter-task interference. These training-free methods offer an efficient way to combine task-specific knowledge from fine-tuned models into a compact multi-task architecture. You can leverage their ability to preserve task knowledge and reduce interference without extensive retraining, potentially streamlining your model deployment and improving performance across diverse tasks.

Key insights

Model merging interference is reduced by isolating and fusing high-energy "essential subspaces" of task updates.

Principles

Method

Essential Subspace Decomposition (ESD) decomposes task updates by activation shift principal components. ESM then orthogonalizes and fuses these essential components. ESM++ further decomposes residuals into low-rank experts with prototype-based routing.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.