Safe-FedLLM: Delving into the Safety of Federated Large Language Models
Summary
Safe-FedLLM is a novel probe-based defense framework designed to enhance the security of federated large language models (FedLLMs) against malicious clients. The framework addresses the vulnerability of LLMs in federated learning environments, where traditional defenses often fail due to the unique characteristics of parameter-efficient fine-tuning (PEFT) using Low-Rank Adaptation (LoRA) weights. Researchers from Hainan University, Tsinghua University, and Shanghai Jiao Tong University found that FedLLMs are highly susceptible to attacks, but LoRA weights from benign and malicious clients exhibit distinguishable patterns. Safe-FedLLM leverages these patterns by employing a LoRA-Probe to detect malicious updates and integrates a Safety Defense Module operating at Step-Level, Client-Level, and Shadow-Level. Experiments on Llama3.1-8B and Qwen2.5-7B, with malicious client ratios from 20% to 50%, demonstrate that Safe-FedLLM significantly improves robustness and safety without compromising performance or introducing substantial training overhead, increasing total training time by only 3.2%.
Key takeaway
For research scientists developing secure federated learning systems for LLMs, you should consider integrating probe-based defense mechanisms that analyze LoRA weights. This approach offers a lightweight and effective way to identify and suppress malicious client updates, maintaining model safety and utility even under high attack intensity, which is crucial for robust real-world deployments.
Key insights
LoRA weight patterns can effectively distinguish malicious from benign updates in federated LLM training.
Principles
- FedLLMs are highly vulnerable to malicious client attacks.
- LoRA weights exhibit separable intrinsic properties for different client types.
Method
Safe-FedLLM uses an offline-trained LoRA-Probe to classify client-generated LoRA weight updates as malicious or benign, then applies multi-level defense modules (Step, Client, Shadow) and security-weighted aggregation to mitigate threats.
In practice
- Implement probe-based discrimination on LoRA weights.
- Utilize a shadow LoRA branch for stable security signal generation.
Topics
- Federated Large Language Models
- LoRA Weights
- Malicious Client Attacks
- Probe-based Defense
- Model Poisoning
Code references
Best for: Research Scientist, AI Scientist, AI Security Engineer, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.