Unified Safe In-context Image Generation in Multimodal Diffusion Transformers via Restricting Unsafe Information Flows

2026-06-08 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cybersecurity & Data Privacy · Depth: Expert, extended

Summary

Unified Visual Safety Regulator (UVR) is a novel, training-free framework designed to prevent the generation of harmful content in Multimodal Diffusion Transformers (MM-DiTs), particularly addressing limitations of existing safety mechanisms in image-to-image (I2I) editing tasks. Grounded in an analysis of attention dynamics within MM-Attn, UVR identifies a task-independent "semantic start-up stage" where unsafe semantics rapidly emerge and can be localized. It then mitigates harmful generation through unified, targeted attention modulation and explicit restriction of unsafe information flow over identified output patches. Experiments on FLUX.1-dev and FLUX.1-Kontext-dev across various concepts, including nudity, IP characters, and inappropriate objects, demonstrate UVR's state-of-the-art safety performance, achieving 91% and 77% erase rates in image synthesis and editing tasks, respectively, while preserving visual quality and fidelity.

Key takeaway

For AI Security Engineers or Machine Learning Engineers implementing safety mechanisms for multimodal diffusion models, UVR offers a robust, training-free solution. Its unified approach for both text-to-image synthesis and image-to-image editing, achieving high erase rates (91% and 77% respectively) with minimal quality degradation, makes it a compelling choice. You should consider integrating UVR for inference-time safety control in FLUX-series or similar DiT architectures, especially for context-insensitive risks like explicit content or intellectual property violations.

Key insights

UVR unifies safety in multimodal DiTs by modulating attention to restrict unsafe information flow at early, localized stages.

Principles

Unsafe semantics emerge early in DiT generation.
Attention dynamics reveal task-independent and task-specific stages.
Targeted attention modulation can block harmful content.

Method

UVR localizes unsafe visual patches using pre-collected "unsafe anchors" and then regulates them via adaptive attention modulation and explicit restriction of harmful information flows, primarily during the semantic start-up stage.

In practice

Construct unsafe anchors from final diffusion timestep outputs.
Apply spatial refinement to localization masks.
Inject Gaussian noise into core unsafe tokens early.

Topics

Diffusion Transformers
Multimodal Attention
Image Generation Safety
Content Moderation
Image-to-Image Editing
Training-Free Methods

Code references

Best for: Research Scientist, Computer Vision Engineer, AI Scientist, Machine Learning Engineer, AI Security Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.