OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models

2026-06-30 · Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

OTCache is a novel, training-free framework designed to accelerate diffusion model sampling by optimizing caching schedule prediction. Addressing limitations of existing graph-based caching methods, which struggle with additive independence assumptions in low NFE regimes, OTCache employs an Optimal Transport (OT)-inspired approach to model caching schedules as a smooth evolution within policy space. The framework operates in three stages: first, establishing a high-fidelity reference schedule using a graph-based method under a conservative budget; second, conducting a lightweight anchor search via Optuna optimization with an end-to-end perceptual objective in an extreme low-budget setting; and finally, predicting schedules for target budgets through quantile interpolation between the reference and anchor policies using continuous warping representations. Released on 2026-06-30, OTCache demonstrates significant performance gains, achieving 4.5x acceleration on FLUX.1 [dev], 4.7x on Qwen-Image, and 3.66x on HunyuanVideo, while consistently enhancing generation fidelity compared to state-of-the-art caching baselines.

Key takeaway

For Machine Learning Engineers optimizing diffusion model inference, OTCache offers a significant acceleration solution without requiring retraining. If you are struggling with fidelity degradation in low NFE settings using existing caching methods, consider integrating OTCache's Optimal Transport-inspired approach. This framework can boost your sampling speed by up to 4.7x while simultaneously improving generation quality, making it a compelling alternative for efficient deployment.

Key insights

OTCache accelerates diffusion model sampling by modeling caching schedules as a smooth evolution using Optimal Transport.

Principles

Caching schedules evolve smoothly across inference budgets.
Low NFE regimes challenge additive independence assumptions.

Method

OTCache establishes a reference schedule, performs an anchor search via Optuna with a perceptual objective, then predicts target schedules using quantile interpolation.

In practice

Achieves 4.5x, 4.7x, 3.66x acceleration on benchmarks.
Consistently improves generation fidelity.

Topics

Diffusion Models
Model Acceleration
Optimal Transport
Caching Schedules
Inference Optimization
Optuna

Code references

UnicomAI/OTCache

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.