CFG-Ctrl: Control-Based Classifier-Free Diffusion Guidance
Summary
CFG-Ctrl introduces a unified framework that reinterprets Classifier-Free Guidance (CFG) in flow-based diffusion models as a control mechanism applied to the first-order continuous-time generative flow. This framework uses the conditional-unconditional discrepancy as an error signal to adjust the velocity field. Vanilla CFG is summarized as a proportional controller (P-control) with fixed gain, while existing variants primarily use linear control, which can lead to instability and overshooting, particularly at large guidance scales. To mitigate these issues, CFG-Ctrl proposes Sliding Mode Control CFG (SMC-CFG). SMC-CFG enforces the generative flow towards a rapidly convergent sliding manifold by defining an exponential sliding mode surface over the semantic prediction error and incorporating a switching control term for nonlinear feedback. Lyapunov stability analysis supports its finite-time convergence. Experiments with Stable Diffusion 3.5, Flux, and Qwen-Image show SMC-CFG improves semantic alignment and robustness over standard CFG across various guidance scales.
Key takeaway
For research scientists and engineers developing or deploying diffusion models, adopting SMC-CFG can significantly improve the stability and semantic fidelity of generative flows. Your models will exhibit enhanced robustness across a wider range of guidance scales, particularly beneficial for applications requiring precise semantic alignment. Consider integrating SMC-CFG to overcome the limitations of traditional linear CFG approaches and achieve more reliable image generation.
Key insights
CFG can be reinterpreted as a control system, improving stability and semantic alignment with nonlinear methods.
Principles
- Linear control in CFG causes instability.
- Nonlinear feedback enhances semantic fidelity.
- Lyapunov analysis proves finite-time convergence.
Method
SMC-CFG defines an exponential sliding mode surface over semantic prediction error and uses a switching control term for nonlinear feedback, ensuring rapid convergence.
In practice
- Apply SMC-CFG to text-to-image models.
- Improve robustness at large guidance scales.
- Enhance semantic alignment in diffusion models.
Topics
- Classifier-Free Guidance
- Diffusion Models
- Sliding Mode Control
- Text-to-Image Generation
- Generative Flows
Best for: Computer Vision Engineer, Research Scientist, AI Researcher, AI Scientist, Deep Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.