MiniMax Releases MiniMax M3 with MSA Architecture Supporting 1M-Token Context, Native Multimodality, and Agentic Coding

2026-06-01 · Source: Machine Learning ML & Generative AI News · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems · Depth: Advanced, quick

Summary

MiniMax has launched MiniMax M3, featuring a novel MiniMax Sparse Attention (MSA) architecture that significantly enhances efficiency for long context processing. This new model supports a 1-million-token context window, achieving 1/20th the per-token compute of its predecessor, resulting in over 9x faster prefill and 15x faster decoding at this length. M3 boasts native multimodality, trained on 100 trillion tokens of interleaved text, image, and video data from inception, supporting various inputs and desktop operations. It demonstrates strong agentic coding capabilities, scoring 59.0% on SWE-Bench Pro, 66.0% on Terminal-Bench 2.1, 74.2% on MCP Atlas, and 70.06% on OSWorld-Verified. Furthermore, M3 autonomously optimized an FP8 GEMM kernel on NVIDIA Hopper GPUs, achieving a 9.4x speedup. The API is live, with open weights and a technical report expected within 10 days.

Key takeaway

For AI Engineers building applications requiring extensive context or multimodal understanding, MiniMax M3 offers a compelling new option. Its MSA architecture delivers significant speedups for long-context tasks, making full-codebase agents or long-document pipelines more practical and cost-effective. You should evaluate M3's API for your next project, especially given its strong coding benchmarks and upcoming open weights, but note its lower PostTrainBench score for ML research automation.

Key insights

MiniMax M3's MSA architecture dramatically improves long-context efficiency and enables native multimodality and advanced agentic coding.

Principles

Sparse attention reduces long-context compute.
Native multimodality enhances model versatility.
Autonomous agents can optimize hardware kernels.

Method

M3 demonstrated long-horizon autonomous iteration by optimizing an FP8 GEMM kernel over 24 hours, making 147 benchmark submissions and 1,959 tool calls without human intervention.

In practice

Deploy full-codebase agents efficiently.
Process extensive long-document pipelines.
Automate hardware kernel optimization.

Topics

MiniMax M3
MSA Architecture
Large Language Models
Multimodality
Agentic AI
Code Generation
Long Context

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, AI Scientist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Machine Learning ML & Generative AI News.