Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains
Summary
JetBrains introduced Mellum2 on June 1, 2026, a 12B-parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code. This model activates only 2.5B parameters per token, enabling efficient, low-latency inference that is over 2x faster than similarly sized models while maintaining competitive benchmark performance. Released under the Apache 2.0 license, Mellum2 is designed for high-throughput production workloads and specialized software engineering tasks. Its key applications include routing and orchestration in multi-model systems, enhancing RAG pipelines, serving as sub-agents for intermediate operations, and facilitating private deployments due to its efficiency and open license. Mellum2 is positioned as a "focal" model, optimizing AI system stacks for speed and cost.
Key takeaway
For AI Engineers building software engineering systems, Mellum2 offers a compelling option to enhance efficiency and reduce latency. If you are designing multi-model AI stacks or need to deploy models in private environments, consider integrating Mellum2 for tasks like routing, RAG processing, or sub-agent operations. Its 2x faster inference and Apache 2.0 license can significantly optimize your system's speed and cost, making your stack faster and easier to control.
Key insights
Mellum2 is a specialized, efficient MoE model for low-latency text and code workloads within larger AI systems.
Principles
- MoE architecture enables high capacity with efficient inference.
- Specialized models optimize performance for specific modalities.
- Multi-model AI systems benefit from "focal" components.
In practice
- Deploy Mellum2 for prompt classification.
- Integrate into RAG for context compression.
- Use for agent planning and validation tasks.
Topics
- Mixture-of-Experts
- Large Language Models
- Code Generation
- AI System Architecture
- RAG Pipelines
- Low-Latency Inference
- JetBrains Mellum2
Best for: AI Architect, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.