MoECa: Aligning Feature Reuse with Expert Decomposition in Diffusion Transformers

2026-06-14 · Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

MoECa is a fine-grained caching framework designed to accelerate Diffusion Transformers with Mixture-of-Experts (DiT-MoE) by addressing redundant computation during diffusion inference. Traditional caching methods, operating at the token level, are suboptimal for DiT-MoE due to its internal decomposition of token updates into multiple routed expert branches. Analysis revealed that cross-timestep redundancy in DiT-MoE is more effectively characterized at the expert-branch level. MoECa leverages this insight to perform branch-level feature reuse across timesteps. The framework further incorporates expert-aware adaptive control and synchronized cache updates across both MoE and attention paths to ensure stable intermediate states. Experimental results on various DiT-MoE models demonstrate that MoECa consistently achieves a superior speed-quality trade-off compared to previous caching methods, delivering up to a 2.83x inference speedup with minimal quality degradation.

Key takeaway

For Machine Learning Engineers optimizing Diffusion Transformer with Mixture-of-Experts (DiT-MoE) inference, you should consider implementing branch-level caching strategies like MoECa. This approach directly addresses the expert-branch level redundancy, offering significant speedups of up to 2.83x without substantial quality degradation. Integrating expert-aware adaptive control and synchronized cache updates will ensure stable performance, making your DiT-MoE deployments more efficient and cost-effective.

Key insights

MoECa accelerates DiT-MoE inference by reusing expert-branch features across timesteps, achieving up to 2.83x speedup.

Principles

Cross-timestep redundancy in DiT-MoE is expert-branch level.
Fine-grained caching improves DiT-MoE speed-quality trade-off.
Synchronized cache updates maintain stable intermediate states.

Method

MoECa performs branch-level feature reuse across timesteps, integrating expert-aware adaptive control and synchronized cache updates for MoE and attention paths.

In practice

Apply branch-level caching for DiT-MoE inference.
Implement expert-aware adaptive control in MoE systems.
Synchronize cache updates across MoE and attention paths.

Topics

Diffusion Transformers
Mixture-of-Experts
DiT-MoE
Inference Optimization
Caching Frameworks
Feature Reuse

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.