Introducing Mellum2: A 12B Mixture-of-Experts Model by JetBrains

2026-06-01 · Source: Hugging Face - Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering · Depth: Intermediate, quick

Summary

JetBrains introduced Mellum2 on June 1, 2026, a 12B-parameter Mixture-of-Experts (MoE) model trained from scratch on natural language and code. This model activates only 2.5B parameters per token, enabling efficient, low-latency inference that is over 2x faster than similarly sized models while maintaining competitive benchmark performance. Released under the Apache 2.0 license, Mellum2 is designed for high-throughput production workloads and specialized software engineering tasks. Its key applications include routing and orchestration in multi-model systems, enhancing RAG pipelines, serving as sub-agents for intermediate operations, and facilitating private deployments due to its efficiency and open license. Mellum2 is positioned as a "focal" model, optimizing AI system stacks for speed and cost.

Key takeaway

For AI Engineers building software engineering systems, Mellum2 offers a compelling option to enhance efficiency and reduce latency. If you are designing multi-model AI stacks or need to deploy models in private environments, consider integrating Mellum2 for tasks like routing, RAG processing, or sub-agent operations. Its 2x faster inference and Apache 2.0 license can significantly optimize your system's speed and cost, making your stack faster and easier to control.

Key insights

Mellum2 is a specialized, efficient MoE model for low-latency text and code workloads within larger AI systems.

Principles

MoE architecture enables high capacity with efficient inference.
Specialized models optimize performance for specific modalities.
Multi-model AI systems benefit from "focal" components.

In practice

Deploy Mellum2 for prompt classification.
Integrate into RAG for context compression.
Use for agent planning and validation tasks.

Topics

Mixture-of-Experts
Large Language Models
Code Generation
AI System Architecture
RAG Pipelines
Low-Latency Inference
JetBrains Mellum2

Best for: AI Architect, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

See Counsel's argued verdicts on the open AI decisions leaders are weighing →

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Hugging Face - Blog.