OTCache: Optimal Transport for Geometry-Aware Caching in Diffusion Models

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

OTCache is a novel, training-free framework designed to accelerate diffusion model sampling by optimizing caching schedule prediction. Addressing limitations of existing graph-based caching methods, which struggle with additive independence assumptions in low NFE regimes, OTCache employs an Optimal Transport (OT)-inspired approach to model caching schedules as a smooth evolution within policy space. The framework operates in three stages: first, establishing a high-fidelity reference schedule using a graph-based method under a conservative budget; second, conducting a lightweight anchor search via Optuna optimization with an end-to-end perceptual objective in an extreme low-budget setting; and finally, predicting schedules for target budgets through quantile interpolation between the reference and anchor policies using continuous warping representations. Released on 2026-06-30, OTCache demonstrates significant performance gains, achieving 4.5x acceleration on FLUX.1 [dev], 4.7x on Qwen-Image, and 3.66x on HunyuanVideo, while consistently enhancing generation fidelity compared to state-of-the-art caching baselines.

Key takeaway

For Machine Learning Engineers optimizing diffusion model inference, OTCache offers a significant acceleration solution without requiring retraining. If you are struggling with fidelity degradation in low NFE settings using existing caching methods, consider integrating OTCache's Optimal Transport-inspired approach. This framework can boost your sampling speed by up to 4.7x while simultaneously improving generation quality, making it a compelling alternative for efficient deployment.

Key insights

OTCache accelerates diffusion model sampling by modeling caching schedules as a smooth evolution using Optimal Transport.

Principles

Method

OTCache establishes a reference schedule, performs an anchor search via Optuna with a perceptual objective, then predicts target schedules using quantile interpolation.

In practice

Topics

Code references

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.