PURGE: Projected Unlearning via Retain-Guided Erasure
Summary
PURGE is a novel machine unlearning algorithm introduced on 2026-06-02, which leverages the duality between continual learning and machine unlearning. It adapts gradient projection from A-GEM to ensure that each unlearning step does not increase the retain-set loss. The algorithm also incorporates multi-layer representation erasure, pushing forget-set activations in intermediate layers towards the retain distribution to remove information more deeply. A key design choice is its retain-confusion target, which aims for the model's natural confusion pattern on retain data instead of a uniform distribution, making unlearned models harder to distinguish from those retrained from scratch. PURGE employs two self-regulating stopping criteria—a retain-loss budget and a forget-accuracy target—eliminating the need for manual epoch tuning. Across experiments on five datasets (CIFAR-10, MNIST, SVHN, STL10, PathMNIST) and 22 class-level forgetting tasks, PURGE consistently maintained retain accuracy above 96% and achieved a Membership Inference Attack (MIA) AUROC close to 0.5, outperforming several published baselines.
Key takeaway
For Machine Learning Engineers developing models requiring data erasure, PURGE offers a robust unlearning method that significantly enhances privacy without sacrificing utility. You should consider integrating its gradient projection and retain-confusion targeting approach to achieve MIA AUROC close to 0.5 while maintaining over 96% retain accuracy. This approach also simplifies deployment by eliminating manual epoch tuning, streamlining your unlearning pipeline.
Key insights
PURGE unlearns data by projecting gradients to preserve retain-set performance and erasing hidden representations towards retain distribution.
Principles
- Machine unlearning and continual learning are dual problems.
- Target retain-confusion, not uniform distribution, for privacy.
- Self-regulating criteria remove manual tuning.
Method
PURGE adapts A-GEM's gradient projection to constrain unlearning steps, preventing retain-set loss increase. It performs multi-layer representation erasure, pushing forget-set activations towards the retain distribution.
In practice
- Implement gradient projection for unlearning.
- Use retain-confusion as an unlearning target.
- Apply multi-layer representation erasure.
Topics
- Machine Unlearning
- Continual Learning
- Gradient Projection
- Representation Erasure
- Data Privacy
- Membership Inference Attacks
Best for: Research Scientist, CTO, VP of Engineering/Data, AI Scientist, Machine Learning Engineer, AI Security Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.