Consistent-Inversion: Reverse Consistency Guidance for Structure-Preserving Visual Editing

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, extended

Summary

Consistent-Inversion introduces a training-free reverse consistency guidance framework for structure-preserving visual editing using text-guided diffusion models. It addresses the trajectory mismatch in inversion-based editors by not treating the inverted source latent as a fixed initialization. Instead, the method constructs an auxiliary target-side noise representation, performs source-guided reverse denoising, and uses the resulting reverse consistency discrepancy as a correction signal for selected early target denoising steps. Experiments on PIE-Bench, using a unified SD3.5 protocol, demonstrate that Consistent-Inversion improves background and structural fidelity, reducing BG-LPIPS from 0.2194 to 0.2051 and LPIPS from 0.4409 to 0.4122, while maintaining target-prompt alignment. It is compatible with existing inversion-based editors and introduces only a small inference overhead, increasing runtime from 5.85s to 6.05s over Direct Inversion.

Key takeaway

For Machine Learning Engineers developing real-image editing systems, Consistent-Inversion offers a practical way to enhance structural preservation without extensive retraining. You should integrate this training-free reverse consistency guidance into your existing inversion-based pipelines, focusing on sparse, early-timestep corrections. This approach improves background and layout fidelity with minimal runtime overhead, ensuring your edits remain consistent with source structure while achieving target semantic changes.

Key insights

Reverse consistency guidance corrects structural drift in diffusion-based image editing by checking trajectory reversibility.

Principles

Inversion-based editing creates a trajectory mismatch between source reconstruction and target modification.
Early denoising stages primarily establish global layout and low-frequency structure.
Structural drift can be estimated by reversing an intermediate target state back to the source trajectory.

Method

Construct an auxiliary target-side noise representation, perform source-guided reverse denoising, compute the discrepancy, and inject this offset into selected early target denoising steps.

In practice

Apply correction sparsely at early timesteps for efficiency and structural benefit.
Combine with existing attention-based or feature-injection editors.
Configure correction strength and timesteps based on latency and preservation needs.

Topics

Diffusion Models
Image Editing
Structure Preservation
Inversion-Based Editing
Consistency Guidance
Latent Space

Best for: Research Scientist, AI Scientist, Machine Learning Engineer, Computer Vision Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.