TAI #185: China's Open-Weight Holiday Blitz; GLM 4.7, Minimax M2.1 & MAI-UI

2024-09-10 · Source: Towards AI Newsletter · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Data Science & Analytics · Depth: Intermediate, long

Summary

The AI landscape saw significant activity over the holiday period, particularly from Chinese and open-source developers, with major model releases and M&A deals. Z.ai launched GLM 4.7, a large-scale Mixture-of-Experts (MoE) model featuring "Interleaved Thinking" and "Preserved Thinking" for complex agentic workflows, achieving 85.7% on GPQA-Diamond and 42.8% on Humanity's Last Exam (HLE) with tools. Minimax introduced M2.1, a 229-billion-parameter MoE model optimized for software production and multi-language programming, scoring 88.6% on the VIBE benchmark. Alibaba's Tongyi group released MAI-UI, a family of GUI agents (2B to 235B parameters) designed for direct screen interaction using a device-cloud collaboration system, achieving a 76.7% success rate on AndroidWorld. Independent benchmarks from Artificial Analysis show GLM 4.7 leading open-weight models with an Intelligence Index score of 68, while Minimax M2.1 offers significantly lower inference costs at ~$128 compared to GLM 4.7's ~$334, though both exhibit a "grounding gap" with higher hallucination rates than proprietary models.

Key takeaway

For CTOs and VPs of Engineering evaluating AI model adoption, the emergence of high-performing open-weight models like GLM 4.7 and Minimax M2.1 presents a compelling cost-performance trade-off. While these models offer near-frontier reasoning capabilities at a fraction of the cost of proprietary APIs, you must implement robust verification systems to mitigate their higher hallucination rates. Consider a bifurcated strategy, leveraging "Thinker" models for complex reasoning and "Doer" models for high-volume, cost-sensitive execution to optimize your inference budget and application development.

Key insights

Open-weight models are closing the intelligence gap with proprietary models but show a widening safety gap in reliability.

Principles

Agentic models benefit from persistent reasoning chains.
Efficiency and cost drive application-layer model design.
Direct GUI interaction simplifies agent deployment.

Method

GLM 4.7 uses "Interleaved Thinking" and "Preserved Thinking" to maintain reasoning chains. MAI-UI employs a device-cloud collaboration system to route tasks and reduce cloud calls for GUI agents.

In practice

Route complex planning to "Thinker" models like GLM 4.7.
Use "Doer" models like Minimax M2.1 for coding tasks.
Explore GUI agents for automating legacy applications.

Topics

Open-Weight Models
AI Agents
Model Benchmarking
AI Hardware
Mixture-of-Experts

Code references

Best for: Investor, CTO, VP of Engineering/Data, AI Engineer, AI Product Manager, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Towards AI Newsletter.