SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion
Summary
SHRED (Self-distillation via High-surprisal-only Retain-free Entropy Demotion) is a novel machine unlearning method for large language models (LLMs) that eliminates the need for a retain set, a common dependency complicating deployment. It addresses risks like privacy leakage and copyright infringement by selectively removing memorized content without costly full retraining. SHRED operates in two stages: first, it selects high-information tokens (bottom-$P$ lowest probability) from a forget-set instance as "forget positions"; second, it trains the model using a single top-$K$ KL self-distillation objective. This objective demotes the memorized token's logit at forget positions while preserving the original distribution at benign positions. Evaluated across four benchmarks—TOFU, MUSE, RWKU, and Hubble—SHRED achieves a new Pareto-optimal trade-off between forget efficacy and model utility, outperforming retain-set-dependent methods. It also demonstrates robustness against relearning and membership-inference attacks, and maintains stable utility during sequential unlearning.
Key takeaway
For AI Scientists and ML Engineers implementing LLM unlearning, SHRED offers a robust, retain-set-free solution that simplifies deployment and reduces data dependencies. You should consider adopting SHRED to achieve superior forget efficacy and utility preservation, especially when a curated retain set is unavailable or problematic. Optimize your unlearning process by tuning the demote percentage $P$ and utilizing small batch sizes for stable model utility.
Key insights
LLM unlearning can be retain-set-free by selectively demoting high-information tokens via self-distillation, preserving general utility.
Principles
- Memorized knowledge concentrates in high-information tokens within forget-set instances.
- Self-distillation can preserve general model utility during targeted unlearning.
- Small batch sizes improve utility preservation in machine unlearning processes.
Method
SHRED selects bottom-$P$ lowest probability tokens as forget positions, then trains via a single top-$K$ KL self-distillation objective to demote memorized logits at these positions while preserving others.
In practice
- Tune the demote percentage $P$ (e.g., 50-75%) for optimal forgetting-utility trade-off.
- Use small batch sizes (e.g., BS=1, BS=2) for more stable model utility during training.
- Employ an 8-bit optimizer for memory efficiency without degrading unlearning quality.
Topics
- Machine Unlearning
- Large Language Models
- Self-Distillation
- Logit Demotion
- Data Privacy
- Model Utility
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.