SHRED: Retain-Set-Free Unlearning via Self-Distillation with Logit Demotion

2026-06-06 · Source: cs.AI updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

SHRED (Self-distillation via High-surprisal-only Retain-free Entropy Demotion) is a novel machine unlearning method for large language models (LLMs) that eliminates the need for a retain set, a common dependency complicating deployment. It addresses risks like privacy leakage and copyright infringement by selectively removing memorized content without costly full retraining. SHRED operates in two stages: first, it selects high-information tokens (bottom-$P$ lowest probability) from a forget-set instance as "forget positions"; second, it trains the model using a single top-$K$ KL self-distillation objective. This objective demotes the memorized token's logit at forget positions while preserving the original distribution at benign positions. Evaluated across four benchmarks—TOFU, MUSE, RWKU, and Hubble—SHRED achieves a new Pareto-optimal trade-off between forget efficacy and model utility, outperforming retain-set-dependent methods. It also demonstrates robustness against relearning and membership-inference attacks, and maintains stable utility during sequential unlearning.

Key takeaway

For AI Scientists and ML Engineers implementing LLM unlearning, SHRED offers a robust, retain-set-free solution that simplifies deployment and reduces data dependencies. You should consider adopting SHRED to achieve superior forget efficacy and utility preservation, especially when a curated retain set is unavailable or problematic. Optimize your unlearning process by tuning the demote percentage $P$ and utilizing small batch sizes for stable model utility.

Key insights

LLM unlearning can be retain-set-free by selectively demoting high-information tokens via self-distillation, preserving general utility.

Principles

Memorized knowledge concentrates in high-information tokens within forget-set instances.
Self-distillation can preserve general model utility during targeted unlearning.
Small batch sizes improve utility preservation in machine unlearning processes.

Method

SHRED selects bottom-$P$ lowest probability tokens as forget positions, then trains via a single top-$K$ KL self-distillation objective to demote memorized logits at these positions while preserving others.

In practice

Tune the demote percentage $P$ (e.g., 50-75%) for optimal forgetting-utility trade-off.
Use small batch sizes (e.g., BS=1, BS=2) for more stable model utility during training.
Employ an 8-bit optimizer for memory efficiency without degrading unlearning quality.

Topics

Machine Unlearning
Large Language Models
Self-Distillation
Logit Demotion
Data Privacy
Model Utility

Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.AI updates on arXiv.org.