Z.ai GLM-5: New SOTA Open Weights LLM
Summary
Z.ai has launched GLM-5, a significant update to its large language model, scaling from 355B to 744B parameters (32B to 40B active) and increasing pre-training data from 23T to 28.5T tokens. This Opus-class model integrates DeepSeek Sparse Attention (DSA) to reduce deployment costs while maintaining long-context capacity, offering 200K context and 128K max output. GLM-5 claims top scores on BrowseComp and Vending Bench 2, and notably leads the GDPVal-AA "white collar work" benchmark, surpassing Kimi K2.5. Released under an MIT license, it has seen rapid adoption across platforms like OpenRouter and Modal, despite Zhipu AI openly acknowledging compute constraints affecting serving capacity and pricing. Concurrently, DeepSeek has rolled out a "V4-lite" model with 1M context, emphasizing advanced attention mechanisms, and MiniMax launched M2.5, contributing to a competitive, cost-driven Chinese open model ecosystem.
Key takeaway
For NLP engineers and CTOs evaluating LLM adoption, GLM-5's MIT license and strong performance on agentic and office work benchmarks, coupled with DeepSeek's 1M context capabilities, signal a maturing and cost-effective open-source landscape. You should prioritize models integrating sparse attention for efficient long-context processing, especially given widespread GPU starvation. Be prepared for rapid iteration and competitive pricing from Chinese labs, which could significantly alter your model selection and deployment strategies.
Key insights
Chinese AI labs are rapidly advancing open-weight models, emphasizing cost-efficiency, long-context, and agentic capabilities.
Principles
- Sparse attention reduces LLM deployment cost.
- Open-source models are rapidly closing the capability gap.
- Compute scarcity impacts even major AI players.
Method
GLM-5 scales parameters and pre-training data, integrating DeepSeek Sparse Attention to optimize long-context serving. DeepSeek V4-lite uses advanced attention for 1M context, while MiniMax M2.5 focuses on agentic tasks.
In practice
- Explore GLM-5 for agentic engineering and long-horizon tasks.
- Utilize DeepSeek Sparse Attention for cost-effective long-context LLM deployment.
- Consider Chinese open models for high capability at lower cost.
Topics
- Large Language Models
- DeepSeek Sparse Attention
- AI Agents
- Video Generation
- Model Evaluation
Code references
- zai-org/GLM-5
- VincentKaufmann/noapi-google-search-mcp
- pytorch/ao
- PranavDeepakSathya/theCudaBender
- Parapet-Tech/parapet
Best for: NLP Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, AI Researcher
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AINews.