Budget-Constrained Step-Level Diffusion Caching

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

BudCache is a novel method designed to accelerate diffusion models by optimizing step-level caching under a fixed compute budget. Unlike existing heuristic-based caching methods that use per-step error thresholds, leading to variable inference latency and no direct output quality optimization, BudCache inverts this by pre-setting the compute budget. It then employs a combination of Simulated Annealing and deterministic Hill Climbing in an offline search to identify the cache policy that best preserves final output quality. For very tight budgets, BudCache further introduces cache-aware schedule alignment to adapt time discretization. Experiments on FLUX.1-dev and Wan2.1 demonstrate that BudCache achieves superior generation quality compared to heuristic caching baselines under identical inference budgets.

Key takeaway

For MLOps Engineers managing inference costs and latency for diffusion models, BudCache offers a compelling alternative to traditional heuristic caching. By allowing you to fix your compute budget upfront and then optimizing the cache policy for output quality, you can achieve more predictable performance and potentially lower operational costs. Consider exploring BudCache's offline search approach to optimize your diffusion model deployments for both efficiency and generation quality.

Key insights

Optimizing diffusion model cache policies for fixed compute budgets significantly improves generation quality.

Principles

Method

BudCache combines Simulated Annealing with deterministic Hill Climbing for offline cache policy search. It also uses cache-aware schedule alignment to adapt time discretization for tight compute budgets.

In practice

Topics

Code references

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.