Safety-Guided Flow (SGF): A Unified Framework for Negative Guidance in Safe Generation
Summary
Safety-Guided Flow (SGF) is a unified probabilistic framework for safe generation in diffusion and flow models, addressing the need for robust safety mechanisms as these models enter high-stakes domains. It unifies existing heuristic methods like Shielded Diffusion (Kirchhof et al., 2025) and Safe Denoiser (Kim et al., 2025b) under an energy-based negative guidance approach, utilizing a Maximum Mean Discrepancy (MMD) potential. The framework leverages control-barrier function analysis to identify a "critical time window" early in the denoising process where negative guidance must be strong, decaying to zero afterward to ensure both safety and high-quality generation. Experiments confirm that applying guidance in early steps, specifically for windows like [1.0, 0.8] or [1.0, 0.6], significantly reduces attack success rates (ASR) against nudity prompts and mitigates memorization, while preserving diversity and image fidelity.
Key takeaway
For research scientists and engineers developing or deploying generative AI, understanding the temporal dynamics of safety guidance is critical. Your models will achieve superior safety and fidelity by implementing negative guidance strongly in the early stages of the denoising process, rather than uniformly throughout. Over-applying guidance beyond this "critical window" can degrade image quality and stability, so focus your safety interventions strategically to maximize impact and efficiency.
Key insights
Early, strong negative guidance within a critical time window is crucial for safe and high-quality generative model outputs.
Principles
- MMD potential gradients create repulsive vector fields.
- Control-barrier functions justify time-varying guidance strength.
- Early denoising steps set coarse structure, requiring strong initial guidance.
Method
SGF uses an MMD potential gradient to generate repulsive forces against unsafe distributions. Control-barrier analysis determines a critical time window for strong guidance, which then decays to zero, applied in the x0 space.
In practice
- Apply negative guidance in early denoising steps (e.g., [1.0, 0.8]).
- Use RBF kernels for MMD potential in image generation.
- Estimate kernel bandwidth empirically for adaptive guidance.
Topics
- Safety-Guided Flow
- Diffusion Models
- Negative Guidance
- Control Barrier Functions
- Maximum Mean Discrepancy
Code references
Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by cs.CV updates on arXiv.org.