ClickRemoval: An Interactive Open-Source Tool for Object Removal in Diffusion Models

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Advanced, long

Summary

ClickRemoval is an open-source, interactive object removal tool built on pretrained Stable Diffusion models (SD1.5, SD2.1, SDXL1.0) that operates solely via user clicks, eliminating the need for manual masks, text prompts, or additional training. The tool localizes target objects and restores backgrounds by modulating self-attention during the denoising process. It converts user clicks into semantic maps using M2N2, then employs Self-Guided Attention Redirection (SGAR) and Self-Guided Attention Scheduling (SGAS) to modify attention distributions and control modulation strength across denoising steps. Attention Redirection Guidance (ARG) further blends original and modulated noise predictions for controllable removal strength. ClickRemoval achieves competitive quantitative results, including FID 8.05 and Local-FID 15.56 at 1024 resolution with SDXL1.0, and ranks first in user preference studies across resolutions.

Key takeaway

For research scientists developing image editing tools, ClickRemoval demonstrates a robust, training-free approach to object removal using only click interactions. You should explore integrating self-attention modulation and staged guidance mechanisms into your diffusion model pipelines to enhance usability and performance without extensive retraining. Consider its open-source framework as a baseline for developing more intuitive and precise content creation tools.

Key insights

Click-driven self-attention modulation in diffusion models enables training-free, precise object removal.

Principles

Method

ClickRemoval uses M2N2 for semantic map extraction, SGAR for attention redirection, SGAS for scheduling guidance strength, and ARG for blending noise predictions to control removal.

In practice

Topics

Code references

Best for: Research Scientist, Computer Vision Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.