Z.ai GLM-5: New SOTA Open Weights LLM

· Source: AINews · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Cloud Computing & IT Infrastructure · Depth: Advanced, extended

Summary

Z.ai has launched GLM-5, a significant update to its large language model, scaling from 355B to 744B parameters (32B to 40B active) and increasing pre-training data from 23T to 28.5T tokens. This Opus-class model integrates DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long-context capacity, offering 200K context and 128K max output. GLM-5 claims top scores on BrowseComp and Vending Bench 2, and notably leads the GDPVal-AA "white collar work" benchmark, surpassing Kimi K2.5. Released under an MIT license, it has seen rapid adoption across platforms like OpenRouter and Modal, despite Zhipu AI openly acknowledging compute constraints affecting serving capacity and pricing. Concurrently, DeepSeek has rolled out a "V4-lite" model with 1M context, emphasizing advanced attention mechanisms, and MiniMax launched M2.5, contributing to a competitive, cost-driven Chinese open model ecosystem.

Key takeaway

For NLP engineers and CTOs evaluating LLM adoption, GLM-5's MIT license and strong performance on agentic and office work benchmarks, coupled with DeepSeek's 1M context capabilities, signal a maturing and cost-effective open-source landscape. You should prioritize models integrating sparse attention for efficient long-context processing, especially given widespread GPU starvation. Be prepared for rapid iteration and competitive pricing from Chinese labs, which could significantly alter your model selection and deployment strategies.

Key insights

Chinese AI labs are rapidly advancing open-weight models, emphasizing cost-efficiency, long-context, and agentic capabilities.

Principles

Method

GLM-5 scales parameters and pre-training data, integrating DeepSeek Sparse Attention to optimize long-context serving. DeepSeek V4-lite uses advanced attention for 1M context, while MiniMax M2.5 focuses on agentic tasks.

In practice

Topics

Code references

Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Researcher

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.