MoECa: Aligning Feature Reuse with Expert Decomposition in Diffusion Transformers

· Source: Computer Vision and Pattern Recognition · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

MoECa is a fine-grained caching framework designed to accelerate Diffusion Transformers with Mixture-of-Experts (DiT-MoE) by addressing redundant computation during diffusion inference. Traditional caching methods, operating at the token level, are suboptimal for DiT-MoE due to its internal decomposition of token updates into multiple routed expert branches. Analysis revealed that cross-timestep redundancy in DiT-MoE is more effectively characterized at the expert-branch level. MoECa leverages this insight to perform branch-level feature reuse across timesteps. The framework further incorporates expert-aware adaptive control and synchronized cache updates across both MoE and attention paths to ensure stable intermediate states. Experimental results on various DiT-MoE models demonstrate that MoECa consistently achieves a superior speed-quality trade-off compared to previous caching methods, delivering up to a 2.83x inference speedup with minimal quality degradation.

Key takeaway

For Machine Learning Engineers optimizing Diffusion Transformer with Mixture-of-Experts (DiT-MoE) inference, you should consider implementing branch-level caching strategies like MoECa. This approach directly addresses the expert-branch level redundancy, offering significant speedups of up to 2.83x without substantial quality degradation. Integrating expert-aware adaptive control and synchronized cache updates will ensure stable performance, making your DiT-MoE deployments more efficient and cost-effective.

Key insights

MoECa accelerates DiT-MoE inference by reusing expert-branch features across timesteps, achieving up to 2.83x speedup.

Principles

Method

MoECa performs branch-level feature reuse across timesteps, integrating expert-aware adaptive control and synchronized cache updates for MoE and attention paths.

In practice

Topics

Best for: Research Scientist, AI Engineer, Computer Vision Engineer, AI Scientist, Machine Learning Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Computer Vision and Pattern Recognition.