Co-occurring associated retained concepts in Diffusion Unlearning

2026-06-23 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

A new framework, ReCARE (Robust erasure for CARE), addresses a critical limitation in diffusion model unlearning: the unintended removal of benign co-occurring concepts alongside target harmful content. Existing methods, for instance, might suppress the concept of "person" when unlearning "nudity." The authors define these undesirably suppressed concepts as CARE (Co-occurring Associated REtained concepts) and introduce the CARE score to quantify their preservation. ReCARE explicitly safeguards CARE by automatically constructing a "CARE-set" of curated benign co-occurring tokens extracted from target images, leveraging this vocabulary during training for stable unlearning. Extensive experiments across diverse target concepts, including Nudity, Van Gogh style, and Tench object, demonstrate ReCARE's overall state-of-the-art performance in balancing robust concept erasure, overall utility, and CARE preservation.

Key takeaway

For Machine Learning Engineers developing or deploying diffusion models, especially those focused on content moderation or fine-tuning, you should evaluate ReCARE. This framework directly addresses the critical issue of unintended concept suppression during unlearning, ensuring your models can remove harmful content like nudity or specific styles without losing the ability to generate related, benign elements. Implementing ReCARE can significantly improve model utility and safety, preventing the need for extensive re-training or manual content filtering post-unlearning.

Key insights

Diffusion model unlearning often removes benign co-occurring concepts; ReCARE safeguards these CARE concepts while erasing only the target.

Principles

Unlearning must preserve Co-occurring Associated REtained concepts (CARE).
Quantify CARE preservation with a dedicated metric like the CARE score.
Explicitly safeguard benign co-occurring tokens during unlearning.

Method

ReCARE automatically constructs a CARE-set of benign co-occurring tokens from target images and leverages this vocabulary during training to achieve stable unlearning.

In practice

Apply ReCARE to unlearn harmful content like nudity.
Use ReCARE for style transfer unlearning (e.g., Van Gogh).
Implement ReCARE to remove specific objects (e.g., Tench).

Topics

Diffusion Models
Concept Unlearning
Harmful Content Mitigation
ReCARE Framework
CARE Score
Generative AI Safety

Best for: Research Scientist, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.