Consistent-Inversion: Reverse Consistency Guidance for Structure-Preserving Visual Editing

· Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Consistent-Inversion introduces a training-free reverse consistency guidance framework for structure-preserving visual editing using text-guided diffusion models. It addresses the trajectory mismatch in inversion-based editors by not treating the inverted source latent as a fixed initialization. Instead, the method constructs an auxiliary target-side noise representation, performs source-guided reverse denoising, and uses the resulting reverse consistency discrepancy as a correction signal for selected early target denoising steps. Experiments on PIE-Bench, using a unified SD3.5 protocol, demonstrate that Consistent-Inversion improves background and structural fidelity, reducing BG-LPIPS from 0.2194 to 0.2051 and LPIPS from 0.4409 to 0.4122, while maintaining target-prompt alignment. It is compatible with existing inversion-based editors and introduces only a small inference overhead, increasing runtime from 5.85s to 6.05s over Direct Inversion.

Key takeaway

For Machine Learning Engineers developing real-image editing systems, Consistent-Inversion offers a practical way to enhance structural preservation without extensive retraining. You should integrate this training-free reverse consistency guidance into your existing inversion-based pipelines, focusing on sparse, early-timestep corrections. This approach improves background and layout fidelity with minimal runtime overhead, ensuring your edits remain consistent with source structure while achieving target semantic changes.

Key insights

Reverse consistency guidance corrects structural drift in diffusion-based image editing by checking trajectory reversibility.

Principles

Method

Construct an auxiliary target-side noise representation, perform source-guided reverse denoising, compute the discrepancy, and inject this offset into selected early target denoising steps.

In practice

Topics

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.