TAI #185: China's Open-Weight Holiday Blitz; GLM 4.7, Minimax M2.1 & MAI-UI
Summary
The AI landscape saw significant activity over the holiday period, particularly from Chinese and open-source developers, with major model releases and M&A deals. Z.ai launched GLM 4.7, a large-scale Mixture-of-Experts (MoE) model featuring "Interleaved Thinking" and "Preserved Thinking" for complex agentic workflows, achieving 85.7% on GPQA-Diamond and 42.8% on Humanity's Last Exam (HLE) with tools. Minimax introduced M2.1, a 229-billion-parameter MoE model optimized for software production and multi-language programming, scoring 88.6% on the VIBE benchmark. Alibaba's Tongyi group released MAI-UI, a family of GUI agents (2B to 235B parameters) designed for direct screen interaction using a device-cloud collaboration system, achieving a 76.7% success rate on AndroidWorld. Independent benchmarks from Artificial Analysis show GLM 4.7 leading open-weight models with an Intelligence Index score of 68, while Minimax M2.1 offers significantly lower inference costs at ~$128 compared to GLM 4.7's ~$334, though both exhibit a "grounding gap" with higher hallucination rates than proprietary models.
Key takeaway
For CTOs and VPs of Engineering evaluating AI model adoption, the emergence of high-performing open-weight models like GLM 4.7 and Minimax M2.1 presents a compelling cost-performance trade-off. While these models offer near-frontier reasoning capabilities at a fraction of the cost of proprietary APIs, you must implement robust verification systems to mitigate their higher hallucination rates. Consider a bifurcated strategy, leveraging "Thinker" models for complex reasoning and "Doer" models for high-volume, cost-sensitive execution to optimize your inference budget and application development.
Key insights
Open-weight models are closing the intelligence gap with proprietary models but show a widening safety gap in reliability.
Principles
- Agentic models benefit from persistent reasoning chains.
- Efficiency and cost drive application-layer model design.
- Direct GUI interaction simplifies agent deployment.
Method
GLM 4.7 uses "Interleaved Thinking" and "Preserved Thinking" to maintain reasoning chains. MAI-UI employs a device-cloud collaboration system to route tasks and reduce cloud calls for GUI agents.
In practice
- Route complex planning to "Thinker" models like GLM 4.7.
- Use "Doer" models like Minimax M2.1 for coding tasks.
- Explore GUI agents for automating legacy applications.
Topics
- Open-Weight Models
- AI Agents
- Model Benchmarking
- AI Hardware
- Mixture-of-Experts
Code references
Best for: Investor, CTO, VP of Engineering/Data, AI Engineer, AI Product Manager, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI Newsletter.