Learning to Recover Task Experts from a Multi-Task Merged Model

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

The ReTeX (Recover Task eXpert) framework addresses parameter interference in multi-task model merging, a common issue where consolidating task-specific experts into a unified model degrades individual performance. ReTeX models this interference as parameter perturbations, approximating them as additive offsets. It then predicts these offsets to recover over 95% of individual-expert performance from a single merged checkpoint across vision and NLP domains. A router-free task identifier, based on offline SVD subspace signatures, selects the appropriate expert when task identity is unknown by minimizing projection residuals. Crucially, ReTeX also demonstrates emergent adaptive interpolation of expert knowledge, significantly improving generalization to unseen and out-of-distribution tasks.

Key takeaway

For Machine Learning Engineers developing multi-task models, ReTeX offers a robust solution to mitigate parameter interference and enhance model generalization. You should consider integrating ReTeX's offset prediction and SVD-based task identification to recover individual expert performance and improve adaptability to unseen or out-of-distribution tasks, streamlining your merged model deployment.

Key insights

ReTeX recovers individual task-expert performance from merged multi-task models by predicting and undoing parameter interference.

Principles

Method

ReTeX predicts additive offsets to reverse parameter perturbations in merged models. It uses SVD subspace signatures, computed offline, to identify tasks at inference by selecting the subspace with the smallest projection residual.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.