SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

· Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

SHRED (Self-distillation via High-surprisal-only Retain-free Entropy Demotion) is a novel machine unlearning method for large language models (LLMs) that eliminates the need for a retain set, a common dependency complicating deployment. It addresses risks like privacy leakage and copyright infringement by selectively removing memorized content without costly full retraining. SHRED operates in two stages: first, it selects high-information tokens (bottom-$P$ lowest probability) from a forget-set instance as "forget positions"; second, it trains the model using a single top-$K$ KL self-distillation objective. This objective demotes the memorized token's logit at forget positions while preserving the original distribution at benign positions. Evaluated across four benchmarks—TOFU, MUSE, RWKU, and Hubble—SHRED achieves a new Pareto-optimal trade-off between forget efficacy and model utility, outperforming retain-set-dependent methods. It also demonstrates robustness against relearning and membership-inference attacks, and maintains stable utility during sequential unlearning.

Key takeaway

For AI Scientists and ML Engineers implementing LLM unlearning, SHRED offers a robust, retain-set-free solution that simplifies deployment and reduces data dependencies. You should consider adopting SHRED to achieve superior forget efficacy and utility preservation, especially when a curated retain set is unavailable or problematic. Optimize your unlearning process by tuning the demote percentage $P$ and utilizing small batch sizes for stable model utility.

Key insights

LLM unlearning can be retain-set-free by selectively demoting high-information tokens via self-distillation, preserving general utility.

Principles

Method

SHRED selects bottom-$P$ lowest probability tokens as forget positions, then trains via a single top-$K$ KL self-distillation objective to demote memorized logits at these positions while preserving others.

In practice

Topics

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.