ComfyUI Update: Stable Video Diffusion on 8GB vram with 25 frames and more.

2023-11-24 · Source: ComfyUI blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Emerging Technologies & Innovation · Depth: Intermediate, quick

Summary

ComfyUI has released a new update introducing several key features and enhancements. The platform now supports the Stable Video Diffusion image-to-video model, enabling users to generate 1024x576 videos of 25 frames on hardware like a GTX 1080 with 8GB VRAM. Additionally, ComfyUI integrates LCM models and their corresponding LoRAs for faster sampling, alongside Kohya Deep Shrink's "PatchModelAddDownscale" node for consistent high-resolution image generation without a second pass. Support for ZSNR V Prediction Models has been added via the new "ModelSamplingDiscrete" and "RescaleCFG" nodes. Other updates include TAESD support in the "Load VAE" node for high-quality previews, a "SaveAnimatedWEBP" node, UI improvements like color schemes and workflow loading from API format, and new nodes for image manipulation and latent interpolation.

Key takeaway

For AI Engineers and Machine Learning Engineers working with generative models, this ComfyUI update offers significant performance and quality improvements. You should explore the new Stable Video Diffusion integration for video generation tasks and leverage LCM models for faster image sampling. Additionally, consider implementing Kohya Deep Shrink and ZSNR V Prediction support to enhance high-resolution image consistency and model accuracy in your workflows.

Key insights

ComfyUI's latest update enhances video generation, sampling efficiency, and high-resolution image consistency.

Principles

Efficient sampling reduces generation steps.
Scheduled downscaling improves high-res consistency.

Method

To use ZSNR v_pred models, load with the regular checkpoint loader, then chain "ModelSamplingDiscrete" with v_pred and zsnr selected, and add the "RescaleCFG" node.

In practice

Generate 1024x576 videos with Stable Video Diffusion.
Use LCM LoRAs for faster SDXL/SD1.x sampling.
Add "PatchModelAddDownscale" for consistent high-res images.

Topics

ComfyUI
Stable Video Diffusion
Latent Consistency Models
ZSNR V Prediction
Image Generation Workflows

Code references

Best for: Machine Learning Engineer, Deep Learning Engineer, AI Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by ComfyUI blog.