ICED: Concept-level Machine Unlearning via Interpretable Concept Decomposition
Summary
ICED (Interpretable ConcEpt Decomposition) is a novel machine unlearning framework for Vision-Language Models (VLMs) like CLIP, addressing the limitations of instance-level unlearning which often removes unrelated semantics. ICED constructs a compact, task-specific concept vocabulary from the forgetting set using a multimodal large language model (MLLM). It then decomposes visual representations into sparse, non-negative combinations of these semantic concepts, enabling fine-grained knowledge manipulation. The method formulates unlearning as a concept-level optimization problem, selectively suppressing target concepts while preserving intra-instance non-target semantics and global cross-modal knowledge. Experiments on CIFAR-10 and ImageNet-1K, using CLIP RN50 and RN101 backbones, demonstrate that ICED achieves more comprehensive target forgetting and better preserves non-target knowledge and model utility compared to existing VLM unlearning methods, with a superior average score across various benchmarks.
Key takeaway
For research scientists and engineers developing or deploying Vision-Language Models, ICED offers a more precise approach to machine unlearning. If you need to remove specific concepts (e.g., sensitive data, copyrighted content) from a VLM without degrading its general utility or unrelated knowledge, consider implementing concept-level decomposition and optimization. This method significantly reduces collateral damage compared to traditional instance-level unlearning, making your models more compliant and robust to data removal requests.
Key insights
Concept-level unlearning in VLMs precisely removes target knowledge by decomposing visual representations into semantic concepts.
Principles
- Instance-level unlearning is too coarse for VLMs.
- Decompose visual representations into sparse concept bases.
- Balance forgetting, intra-instance, and global preservation.
Method
ICED constructs a task-specific concept vocabulary via MLLM, aligns modalities, and decomposes visual features into sparse concept combinations. It then optimizes three loss functions for forgetting, intra-instance preservation, and global knowledge retention.
In practice
- Use MLLMs to build task-specific concept vocabularies.
- Apply $\ell_1$ sparsity regularization for concept decomposition.
- Combine forgetting, intra-instance, and global preservation losses.
Topics
- Machine Unlearning
- Vision-Language Models
- Interpretable Concept Decomposition
- CLIP Models
- Multimodal Large Language Models
Code references
Best for: Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.