OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models
Summary
OTCache is a novel, training-free framework designed to accelerate diffusion model sampling by optimizing caching schedule prediction. Addressing limitations of existing graph-based caching methods, which struggle with additive independence assumptions in low NFE regimes, OTCache employs an Optimal Transport (OT)-inspired approach to model caching schedules as a smooth evolution within policy space. The framework operates in three stages: first, establishing a high-fidelity reference schedule using a graph-based method under a conservative budget; second, conducting a lightweight anchor search via Optuna optimization with an end-to-end perceptual objective in an extreme low-budget setting; and finally, predicting schedules for target budgets through quantile interpolation between the reference and anchor policies using continuous warping representations. Released on 2026-06-30, OTCache demonstrates significant performance gains, achieving 4.5x acceleration on FLUX.1 [dev], 4.7x on Qwen-Image, and 3.66x on HunyuanVideo, while consistently enhancing generation fidelity compared to state-of-the-art caching baselines.
Key takeaway
For Machine Learning Engineers optimizing diffusion model inference, OTCache offers a significant acceleration solution without requiring retraining. If you are struggling with fidelity degradation in low NFE settings using existing caching methods, consider integrating OTCache's Optimal Transport-inspired approach. This framework can boost your sampling speed by up to 4.7x while simultaneously improving generation quality, making it a compelling alternative for efficient deployment.
Key insights
OTCache accelerates diffusion model sampling by modeling caching schedules as a smooth evolution using Optimal Transport.
Principles
- Caching schedules evolve smoothly across inference budgets.
- Low NFE regimes challenge additive independence assumptions.
Method
OTCache establishes a reference schedule, performs an anchor search via Optuna with a perceptual objective, then predicts target schedules using quantile interpolation.
In practice
- Achieves 4.5x, 4.7x, 3.66x acceleration on benchmarks.
- Consistently improves generation fidelity.
Topics
- Diffusion Models
- Model Acceleration
- Optimal Transport
- Caching Schedules
- Inference Optimization
- Optuna
Code references
Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.