EMO: Pretraining mixture of experts for emergent modularity

· Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Expert, medium

Summary

AllenAI has released EMO, a new mixture-of-experts (MoE) model pretrained to achieve emergent modularity without human-defined priors. EMO, a 1B-active, 14B-total-parameter model with 128 experts, allows users to activate only 12.5% of its experts (16 experts) for specific tasks while retaining near full-model performance, with only a 3% absolute performance drop. In contrast, standard MoEs degrade significantly when using expert subsets. EMO achieves this by restricting all tokens within a document to choose experts from a shared, router-selected pool during training, encouraging experts to specialize in semantic domains like "Health, Medical & Wellness" rather than low-level lexical patterns. The model was trained on 1 trillion tokens and includes global load balancing and random document pool sizing to enhance stability and flexibility.

Key takeaway

For AI Engineers deploying large language models, EMO offers a practical solution to reduce computational cost and memory footprint. You can now use a small, task-specific subset of experts (e.g., 12.5%) from a single EMO model while maintaining high performance, effectively turning one model into a composable architecture. This approach significantly improves memory-accuracy tradeoffs compared to monolithic or standard MoE systems, making large models more adaptable and efficient for diverse applications.

Key insights

EMO enables emergent modularity in MoE models, allowing task-specific expert subsets to retain near full-model performance.

Principles

Method

EMO trains MoE routers to select a shared expert pool for all tokens within a document, encouraging domain-specific expert specialization. It uses global load balancing and random document pool sizing.

In practice

Topics

Code references

Best for: AI Engineer, NLP Engineer, Research Scientist, AI Scientist, Machine Learning Engineer, AI Architect

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.