Budget-Constrained Step-Level Diffusion Caching

2026-06-11 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Computer Vision · Depth: Expert, quick

Summary

BudCache is a novel method designed to accelerate diffusion models by optimizing step-level caching under a fixed compute budget. Unlike existing heuristic-based caching methods that use per-step error thresholds, leading to variable inference latency and no direct output quality optimization, BudCache inverts this by pre-setting the compute budget. It then employs a combination of Simulated Annealing and deterministic Hill Climbing in an offline search to identify the cache policy that best preserves final output quality. For very tight budgets, BudCache further introduces cache-aware schedule alignment to adapt time discretization. Experiments on FLUX.1-dev and Wan2.1 demonstrate that BudCache achieves superior generation quality compared to heuristic caching baselines under identical inference budgets.

Key takeaway

For MLOps Engineers managing inference costs and latency for diffusion models, BudCache offers a compelling alternative to traditional heuristic caching. By allowing you to fix your compute budget upfront and then optimizing the cache policy for output quality, you can achieve more predictable performance and potentially lower operational costs. Consider exploring BudCache's offline search approach to optimize your diffusion model deployments for both efficiency and generation quality.

Key insights

Optimizing diffusion model cache policies for fixed compute budgets significantly improves generation quality.

Principles

Fixing compute budget and optimizing policy outperforms threshold heuristics.
Offline cache policy search avoids online inference overhead.
Adapting time discretization can mitigate cache-induced trajectory mismatch.

Method

BudCache combines Simulated Annealing with deterministic Hill Climbing for offline cache policy search. It also uses cache-aware schedule alignment to adapt time discretization for tight compute budgets.

In practice

Implement BudCache for predictable diffusion model inference latency.
Apply offline policy search to optimize resource allocation.
Utilize cache-aware schedule alignment for highly constrained environments.

Topics

Diffusion Models
Inference Acceleration
Caching Strategies
Compute Budget Optimization
Simulated Annealing
Hill Climbing

Code references

Westlake-AGI-Lab/BudCache

Best for: AI Engineer, Computer Vision Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.