Learning to Recover Task Experts from a Multi-Task Merged Model

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

The ReTeX (Recover Task eXpert) framework addresses parameter interference in multi-task model merging, a common issue where consolidating task-specific experts into a unified model degrades individual task performance. Unlike dynamic merging methods that incur high storage and loading costs for redundant components, ReTeX operates from a single merged checkpoint. It models parameter interference as additive offsets resulting from affine transformations during merging, then predicts these offsets to restore task-expert performance. A novel router-free task identifier, based on SVD subspace signatures computed offline, selects the appropriate expert at inference by minimizing projection residuals for a given input. This approach recovers over 95% of individual-expert performance across both vision and NLP domains and significantly enhances generalization to unseen tasks, demonstrating emergent adaptive interpolation for out-of-distribution scenarios.

Key takeaway

For Machine Learning Engineers building multi-task models, if you are struggling with performance degradation from parameter interference in merged checkpoints, ReTeX offers a solution to recover individual task expert performance. You can achieve over 95% of original expert performance from a single merged model, avoiding the storage overhead of dynamic merging. Consider integrating this offset prediction and SVD-based task identification to enhance generalization and adaptively handle out-of-distribution tasks.

Key insights

ReTeX recovers individual task expert performance from a single merged model by predicting and undoing parameter interference via additive offsets.

Principles

Method

ReTeX predicts additive offsets to reverse parameter perturbations from merging. An offline SVD subspace signature-based identifier selects the task expert at inference by finding the smallest projection residual for the input.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.