Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation

2026-03-17 · Source: cs.CV updates on arXiv.org · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Expert, extended

Summary

Safety-Guided Flow (SGF) is a unified probabilistic framework for safe generation in diffusion and flow models, addressing the need for robust safety mechanisms as these models enter high-stakes domains. It unifies existing heuristic methods like Shielded Diffusion (Kirchhof et al., 2025) and Safe Denoiser (Kim et al., 2025b) under an energy-based negative guidance approach, utilizing a Maximum Mean Discrepancy (MMD) potential. The framework leverages control-barrier function analysis to identify a "critical time window" early in the denoising process where negative guidance must be strong, decaying to zero afterward to ensure both safety and high-quality generation. Experiments confirm that applying guidance in early steps, specifically for windows like [1.0, 0.8] or [1.0, 0.6], significantly reduces attack success rates (ASR) against nudity prompts and mitigates memorization, while preserving diversity and image fidelity.

Key takeaway

For research scientists and engineers developing or deploying generative AI, understanding the temporal dynamics of safety guidance is critical. Your models will achieve superior safety and fidelity by implementing negative guidance strongly in the early stages of the denoising process, rather than uniformly throughout. Over-applying guidance beyond this "critical window" can degrade image quality and stability, so focus your safety interventions strategically to maximize impact and efficiency.

Key insights

Early, strong negative guidance within a critical time window is crucial for safe and high-quality generative model outputs.

Principles

MMD potential gradients create repulsive vector fields.
Control-barrier functions justify time-varying guidance strength.
Early denoising steps set coarse structure, requiring strong initial guidance.

Method

SGF uses an MMD potential gradient to generate repulsive forces against unsafe distributions. Control-barrier analysis determines a critical time window for strong guidance, which then decays to zero, applied in the x0 space.

In practice

Apply negative guidance in early denoising steps (e.g., [1.0, 0.8]).
Use RBF kernels for MMD potential in image generation.
Estimate kernel bandwidth empirically for adaptive guidance.

Topics

Safety-Guided Flow
Diffusion Models
Negative Guidance
Control Barrier Functions
Maximum Mean Discrepancy

Code references

Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.