MiniMax M3 Just Killed Closed-Source Models
Summary
MiniMax shipped its M3 model on June 1, 2026, positioning it as a frontier-class, open-weights coding model with a 1M-token context window. Notably, M3 is priced at approximately 5–10% of rivals like GPT-5.5 and Gemini 3.1 Pro. This cost efficiency stems from a novel approach to attention called MiniMax Sparse Attention (MSA), which challenges the conventional need to process the entire past context for next-token prediction in long sequences. The author notes that most benchmarks are MiniMax's own, and while the model's weights are not yet available for independent verification, the analysis focuses on the mechanism and reported numbers.
Key takeaway
For Machine Learning Engineers evaluating long-context models, MiniMax M3's introduction signals a significant shift towards more cost-effective open-weights solutions. You should investigate sparse attention architectures for your own model development, especially when aiming for large context windows without prohibitive inference costs. Consider M3 as a benchmark for future open-source coding models, and prepare to test its performance once weights become available.
Key insights
MiniMax M3 demonstrates sparse attention enables cost-effective, long-context, open-weights coding models.
Principles
- Long context doesn't require full past attention.
- Sparse attention improves inference speed and cost.
In practice
- Achieve frontier coding at 5-10% cost.
- Utilize 1M-token context windows.
- Explore open-weights alternatives.
Topics
- MiniMax M3
- Sparse Attention
- Open-weights Models
- Coding Models
- Long Context
- AI Inference Cost
Best for: CTO, VP of Engineering/Data, AI Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI - Medium.