FedDetox: Robust Federated SLM Alignment via On-Device Data Sanitization

· Source: Takara TLDR - Daily AI Papers · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Natural Language Processing · Depth: Expert, medium

Summary

FedDetox is a robust framework designed to enhance the safety alignment of Small Language Models (SLMs) in Federated Learning (FL) environments, particularly on resource-constrained edge devices. The framework addresses the issue of "unintended data poisoning" where real-world client data contains toxic or unsafe information, potentially damaging global model safety. FedDetox employs knowledge distillation to transfer advanced safety alignment capabilities from large teacher models to lightweight student classifiers suitable for edge devices. During federated human preference alignment, the edge client identifies unsafe samples at the source and replaces them with refusal templates, converting potential poisons into positive safety signals. Experiments show that FedDetox maintains model safety comparable to centralized baselines without sacrificing general utility.

Key takeaway

For research scientists developing federated learning systems for Small Language Models, you should consider integrating on-device data sanitization techniques like FedDetox. This approach effectively mitigates unintended data poisoning from user-generated content, preserving model safety and alignment without compromising overall utility, which is crucial for deploying robust SLMs on resource-constrained edge devices.

Key insights

FedDetox ensures SLM safety in federated learning by sanitizing toxic on-device data via knowledge distillation and refusal templates.

Principles

Method

Transfer safety alignment from large teacher models to lightweight student classifiers via knowledge distillation. On edge devices, identify unsafe samples and replace them with refusal templates during federated human preference alignment.

In practice

Topics

Code references

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, NLP Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Takara TLDR - Daily AI Papers.