Edit the Bits, Diff the Codes: Bitwise Residual Editing for Visual Autoregressive Models

· Source: Computation and Language · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

BitResEdit is a novel, training-free editor designed for bitwise-residual visual autoregressive (VAR) generators, such as Infinity. It addresses limitations in existing VAR editors that primarily operate on token streams or features by leveraging the native per-bit Bernoulli prediction head and additive multi-scale residual code field. BitResEdit integrates two components: BitEdit, which performs source-negative guidance by adjusting post-CFG per-bit log-odds within a Bernoulli-KL trust region, and ResEdit, which converts sampled bits into continuous-code residuals, gates them with a localization mask, and re-injects them through the generator's sum-of-scales. This method effectively couples decision-time bit guidance with combination-time code composition, ensuring exact preservation of masked-out latent features while applying localized, scale-aware edits. Benchmarking on PIE-Bench with Infinity-2B, BitResEdit demonstrated superior text alignment, improving CLIP scores on edited regions by +1.07 over the strongest prior editor, alongside competitive background preservation.

Key takeaway

For Computer Vision Engineers developing text-guided image editing systems with visual autoregressive models, BitResEdit offers a significant advancement. You should consider integrating this training-free approach to achieve superior text alignment and precise, localized edits. Its ability to preserve unedited background features while improving CLIP scores by +1.07 on edited regions makes it a compelling choice for applications requiring high fidelity and control.

Key insights

BitResEdit enhances visual autoregressive editing by directly manipulating bitwise residuals and multi-scale code fields for precise, localized changes.

Principles

Method

BitResEdit uses source-negative guidance on per-bit log-odds within a Bernoulli-KL trust region (BitEdit), then converts sampled bits to masked, scale-aware continuous-code residuals for re-injection (ResEdit).

In practice

Topics

Best for: Research Scientist, AI Scientist, Computer Vision Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computation and Language.