MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding
Summary
MiniMax has launched MiniMax M3, featuring a novel MiniMax Sparse Attention (MSA) architecture that significantly enhances efficiency for long context processing. This new model supports a 1-million-token context window, achieving 1/20th the per-token compute of its predecessor, resulting in over 9x faster prefill and 15x faster decoding at this length. M3 boasts native multimodality, trained on 100 trillion tokens of interleaved text, image, and video data from inception, supporting various inputs and desktop operations. It demonstrates strong agentic coding capabilities, scoring 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas, and 70.06% on OSWorld-Verified. Furthermore, M3 autonomously optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs, achieving a 9.4x speedup. The API is live, with open weights and a technical report expected within 10 days.
Key takeaway
For AI Engineers building applications requiring extensive context or multimodal understanding, MiniMax M3 offers a compelling new option. Its MSA architecture delivers significant speedups for long-context tasks, making full-codebase agents or long-document pipelines more practical and cost-effective. You should evaluate M3's API for your next project, especially given its strong coding benchmarks and upcoming open weights, but note its lower PostTrainBench score for ML research automation.
Key insights
MiniMax M3's MSA architecture dramatically improves long-context efficiency and enables native multimodality and advanced agentic coding.
Principles
- Sparse attention reduces long-context compute.
- Native multimodality enhances model versatility.
- Autonomous agents can optimize hardware kernels.
Method
M3 demonstrated long-horizon autonomous iteration by optimizing an FP8 GEMM kernel over 24 hours, making 147 benchmark submissions and 1,959 tool calls without human intervention.
In practice
- Deploy full-codebase agents efficiently.
- Process extensive long-document pipelines.
- Automate hardware kernel optimization.
Topics
- MiniMax M3
- MSA Architecture
- Large Language Models
- Multimodality
- Agentic AI
- Code Generation
- Long Context
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Scientist
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.