DOT-MoE: Differentiable Optimal Transport for MoEfication

· Source: Artificial Intelligence · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, quick

Summary

DOT-MoE is a novel framework designed to convert pre-trained dense Large Language Models (LLMs) into sparse Mixture of Experts (MoE) architectures, addressing the inference efficiency challenges of scaling LLMs. Unlike existing methods that rely on heuristic neuron clustering or random splitting for Feed-Forward Network (FFN) partitioning, DOT-MoE formulates this decomposition as a Differentiable Optimal Transport (DOT) problem. It employs differentiable Sinkhorn-Knopp iterations to manage neuron assignment and enforce strict expert capacity constraints. Furthermore, the framework utilizes Straight-Through Estimators (STE) to jointly learn both the discrete neuron-to-expert assignment and the token-to-expert routing policy end-to-end. Experiments show DOT-MoE significantly outperforms structured pruning, heuristic clustering, and random-split baselines, retaining 90% of the original dense model's performance while reducing active parameters by 50%.

Key takeaway

For Machine Learning Engineers optimizing LLM inference efficiency, DOT-MoE offers a robust method to convert pre-trained dense models into sparse MoEs. You can achieve a 50% reduction in active parameters while retaining 90% of the original model's performance. Consider integrating this Differentiable Optimal Transport approach to significantly lower computational costs for large-scale LLM deployments.

Key insights

DOT-MoE converts dense LLMs to sparse MoEs by framing neuron decomposition as a Differentiable Optimal Transport problem for efficient inference.

Principles

Method

Decompose dense layers as a Differentiable Optimal Transport problem, using Sinkhorn-Knopp iterations for neuron assignment. Jointly learn discrete neuron-to-expert assignment and token-to-expert routing via Straight-Through Estimators.

In practice

Topics

Best for: Research Scientist, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Artificial Intelligence.