Learning to Recover Task Experts from a Multi-Task Merged Model
Summary
The ReTeX (Recover Task eXpert) framework addresses parameter interference in multi-task model merging, a common issue where consolidating task-specific experts into a unified model degrades individual task performance. Unlike dynamic merging methods that incur high storage and loading costs for redundant components, ReTeX operates from a single merged checkpoint. It models parameter interference as additive offsets resulting from affine transformations during merging, then predicts these offsets to restore task-expert performance. A novel router-free task identifier, based on SVD subspace signatures computed offline, selects the appropriate expert at inference by minimizing projection residuals for a given input. This approach recovers over 95% of individual-expert performance across both vision and NLP domains and significantly enhances generalization to unseen tasks, demonstrating emergent adaptive interpolation for out-of-distribution scenarios.
Key takeaway
For Machine Learning Engineers building multi-task models, if you are struggling with performance degradation from parameter interference in merged checkpoints, ReTeX offers a solution to recover individual task expert performance. You can achieve over 95% of original expert performance from a single merged model, avoiding the storage overhead of dynamic merging. Consider integrating this offset prediction and SVD-based task identification to enhance generalization and adaptively handle out-of-distribution tasks.
Key insights
ReTeX recovers individual task expert performance from a single merged model by predicting and undoing parameter interference via additive offsets.
Principles
- Parameter interference in merged models can be modeled as affine transformations.
- Additive offsets can approximate parameter perturbations for expert recovery.
- SVD subspace signatures enable router-free task identification.
Method
ReTeX predicts additive offsets to reverse parameter perturbations from merging. An offline SVD subspace signature-based identifier selects the task expert at inference by finding the smallest projection residual for the input.
In practice
- Recover >95% individual expert performance.
- Improve generalization to unseen tasks.
- Adaptively interpolate knowledge for OOD tasks.
Topics
- Multi-task Learning
- Model Merging
- Parameter Interference
- Task Experts
- SVD Subspace Signatures
- Out-of-Distribution Generalization
Code references
Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.