From Parameters to Feature Space: Task Arithmetic for Backdoor Mitigation in Model Merging

2026-06-10 · Source: Machine Learning · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, quick

Summary

Linear Feature Path Minimization (LFPM) is a novel framework designed to mitigate backdoor attacks in model merging, a cost-effective method for integrating multiple task-specific models. Existing task arithmetic defenses often fail to eliminate backdoors without significantly degrading clean-task performance because they rely on direct parameter-space editing. LFPM addresses this by introducing an anti-backdoor task vector into the backdoored merged model. It formulates backdoor robustness from a unified feature-space perspective, leveraging the Cross-Task Linearity (CTL) framework to guide the optimization of the anti-backdoor task. This approach suppresses backdoors effectively while preserving clean-task performance. An optimization mechanism, utilizing gradient accumulation and loss path-integral, ensures robust backdoor suppression along the interpolation path. Extensive experiments confirm LFPM's strong robustness against backdoor attacks in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) settings.

Key takeaway

For AI Security Engineers or Machine Learning Engineers deploying merged models, existing task arithmetic defenses often compromise clean-task performance. You should consider Linear Feature Path Minimization (LFPM) as a superior alternative. LFPM effectively suppresses backdoors by operating in feature space, ensuring clean-task performance is preserved. This framework provides robust protection in both full fine-tuning and Parameter-Efficient Fine-Tuning (PEFT) scenarios, offering a more reliable approach to secure your unified models.

Key insights

LFPM mitigates model merging backdoors by optimizing an anti-backdoor task vector in feature space, preserving clean performance.

Principles

Backdoor robustness can be formulated from a unified feature-space perspective.
Cross-Task Linearity (CTL) guides anti-backdoor task optimization.
Gradient accumulation and loss path-integral ensure robust suppression.

Method

LFPM introduces an anti-backdoor task vector, optimizing it in feature space under CTL, using gradient accumulation and loss path-integral for robust suppression.

In practice

Apply LFPM to defend against backdoor attacks in model merging.
Use LFPM in full fine-tuning or PEFT settings.

Topics

Model Merging
Backdoor Attacks
Task Arithmetic
Feature Space
Cross-Task Linearity
Parameter-Efficient Fine-Tuning

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning.