From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

· Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

Linear Feature Path Minimization (LFPM) is a novel framework designed to mitigate backdoor attacks in model merging, a cost-effective method for integrating multiple task-specific models. Existing task arithmetic defenses often fail to eliminate backdoors without significantly degrading clean-task performance because they rely on direct parameter-space editing. LFPM addresses this by introducing an anti-backdoor task vector into the backdoored merged model. It formulates backdoor robustness from a unified feature-space perspective, leveraging the Cross-Task Linearity (CTL) framework to guide the optimization of the anti-backdoor task. This approach suppresses backdoors effectively while preserving clean-task performance. An optimization mechanism, utilizing gradient accumulation and loss path-integral, ensures robust backdoor suppression along the interpolation path. Extensive experiments confirm LFPM's strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

Key takeaway

For AI Security Engineers or Machine Learning Engineers deploying merged models, existing task arithmetic defenses often compromise clean-task performance. You should consider Linear Feature Path Minimization (LFPM) as a superior alternative. LFPM effectively suppresses backdoors by operating in feature space, ensuring clean-task performance is preserved. This framework provides robust protection in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) scenarios, offering a more reliable approach to secure your unified models.

Key insights

LFPM mitigates model merging backdoors by optimizing an anti-backdoor task vector in feature space, preserving clean performance.

Principles

Method

LFPM introduces an anti-backdoor task vector, optimizing it in feature space under CTL, using gradient accumulation and loss path-integral for robust suppression.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.