ClickRemoval: An Interactive Open-Source Tool for Object Removal in Diffusion Models
Summary
ClickRemoval is an open-source, interactive object removal tool built on pretrained Stable Diffusion models (SD1.5, SD2.1, SDXL1.0) that operates solely via user clicks, eliminating the need for manual masks, text prompts, or additional training. The tool localizes target objects and restores backgrounds by modulating self-attention during the denoising process. It converts user clicks into semantic maps using M2N2, then employs Self-Guided Attention Redirection (SGAR) and Self-Guided Attention Scheduling (SGAS) to modify attention distributions and control modulation strength across denoising steps. Attention Redirection Guidance (ARG) further blends original and modulated noise predictions for controllable removal strength. ClickRemoval achieves competitive quantitative results, including FID 8.05 and Local-FID 15.56 at 1024 resolution with SDXL1.0, and ranks first in user preference studies across resolutions.
Key takeaway
For research scientists developing image editing tools, ClickRemoval demonstrates a robust, training-free approach to object removal using only click interactions. You should explore integrating self-attention modulation and staged guidance mechanisms into your diffusion model pipelines to enhance usability and performance without extensive retraining. Consider its open-source framework as a baseline for developing more intuitive and precise content creation tools.
Key insights
Click-driven self-attention modulation in diffusion models enables training-free, precise object removal.
Principles
- Clicks can generate semantic maps for object localization.
- Modulating self-attention during denoising guides object removal.
- Staged scheduling of guidance improves background naturalness.
Method
ClickRemoval uses M2N2 for semantic map extraction, SGAR for attention redirection, SGAS for scheduling guidance strength, and ARG for blending noise predictions to control removal.
In practice
- Utilize positive and negative clicks for precise object selection.
- Adjust ARG's 'r' coefficient to control removal strength.
- Deploy SD1.5 for real-time, SDXL1.0 for high-quality removal.
Topics
- ClickRemoval
- Object Removal
- Diffusion Models
- Self-Attention Modulation
- Interactive Image Editing
Code references
Best for: Research Scientist, Computer Vision Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.