From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging
Summary
Linear Feature Path Minimization (LFPM) is a novel framework designed to mitigate backdoor attacks in model merging, a cost-effective method for integrating multiple task-specific models. Existing task arithmetic defenses often fail to eliminate backdoors without significantly degrading clean-task performance because they rely on direct parameter-space editing. LFPM addresses this by introducing an anti-backdoor task vector into the backdoored merged model. It formulates backdoor robustness from a unified feature-space perspective, leveraging the Cross-Task Linearity (CTL) framework to guide the optimization of the anti-backdoor task. This approach suppresses backdoors effectively while preserving clean-task performance. An optimization mechanism, utilizing gradient accumulation and loss path-integral, ensures robust backdoor suppression along the interpolation path. Extensive experiments confirm LFPM's strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.
Key takeaway
For AI Security Engineers or Machine Learning Engineers deploying merged models, existing task arithmetic defenses often compromise clean-task performance. You should consider Linear Feature Path Minimization (LFPM) as a superior alternative. LFPM effectively suppresses backdoors by operating in feature space, ensuring clean-task performance is preserved. This framework provides robust protection in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) scenarios, offering a more reliable approach to secure your unified models.
Key insights
LFPM mitigates model merging backdoors by optimizing an anti-backdoor task vector in feature space, preserving clean performance.
Principles
- Backdoor robustness can be formulated from a unified feature-space perspective.
- Cross-Task Linearity (CTL) guides anti-backdoor task optimization.
- Gradient accumulation and loss path-integral ensure robust suppression.
Method
LFPM introduces an anti-backdoor task vector, optimizing it in feature space under CTL, using gradient accumulation and loss path-integral for robust suppression.
In practice
- Apply LFPM to defend against backdoor attacks in model merging.
- Use LFPM in full fine-tuning or PEFT settings.
Topics
- Model Merging
- Backdoor Attacks
- Task Arithmetic
- Feature Space
- Cross-Task Linearity
- Parameter-Efficient Fine-Tuning
Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.