TAI #185: China's Open-Weight Holiday Blitz; GLM 4.7, Minimax M2.1 & MAI-UI

· Source: Towards AI Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Intermediate, long

Summary

The AI landscape saw significant activity over the holiday period, particularly from Chinese and open-source developers, with major model releases and M&A deals. Z.ai launched GLM 4.7, a large-scale Mixture-of-Experts (MoE) model featuring "Interleaved Thinking" and "Preserved Thinking" for complex agentic workflows, achieving 85.7% on GPQA-Diamond and 42.8% on Humanity's Last Exam (HLE) with tools. Minimax introduced M2.1, a 229-billion-parameter MoE model optimized for software production and multi-language programming, scoring 88.6% on the VIBE benchmark. Alibaba's Tongyi group released MAI-UI, a family of GUI agents (2B to 235B parameters) designed for direct screen interaction using a device-cloud collaboration system, achieving a 76.7% success rate on AndroidWorld. Independent benchmarks from Artificial Analysis show GLM 4.7 leading open-weight models with an Intelligence Index score of 68, while Minimax M2.1 offers significantly lower inference costs at ~$128 compared to GLM 4.7's ~$334, though both exhibit a "grounding gap" with higher hallucination rates than proprietary models.

Key takeaway

For CTOs and VPs of Engineering evaluating AI model adoption, the emergence of high-performing open-weight models like GLM 4.7 and Minimax M2.1 presents a compelling cost-performance trade-off. While these models offer near-frontier reasoning capabilities at a fraction of the cost of proprietary APIs, you must implement robust verification systems to mitigate their higher hallucination rates. Consider a bifurcated strategy, leveraging "Thinker" models for complex reasoning and "Doer" models for high-volume, cost-sensitive execution to optimize your inference budget and application development.

Key insights

Open-weight models are closing the intelligence gap with proprietary models but show a widening safety gap in reliability.

Principles

Method

GLM 4.7 uses "Interleaved Thinking" and "Preserved Thinking" to maintain reasoning chains. MAI-UI employs a device-cloud collaboration system to route tasks and reduce cloud calls for GUI agents.

In practice

Topics

Code references

Best for: Investor, CTO, VP of Engineering/Data, AI Engineer, AI Product Manager, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI Newsletter.