GLM-5.2 is probably the most powerful text-only open weights LLM

2026-06-17 · Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning · Depth: Intermediate, quick

Summary

Z.ai, a Chinese AI lab, released GLM-5.2, a 753B parameter, 1.51TB text-only open-weights LLM, under an MIT license on June 16th, 2026. This Mixture of Experts model features 40 active parameters and boasts a 1 million token context window, a significant increase from GLM-5.1's 200,000. GLM-5.2 has achieved the top position on the Artificial Analysis Intelligence Index v4.1 with a score of 51, surpassing models like MiniMax-M3 and DeepSeek V4 Pro. It also ranks second on the Code Arena WebDev leaderboard, demonstrating strong performance in agentic coding workflows despite lacking image input. However, the model is notably token-hungry, using 43k output tokens per Intelligence Index task. While its pricing on OpenRouter is competitive at \$1.40/million input and \$4.40/million output, its creative SVG generation capabilities show inconsistent results, excelling with a pelican but failing to match GLM-5.1's opossum.

Key takeaway

For AI Scientists and Machine Learning Engineers evaluating powerful open-weights LLMs, GLM-5.2 presents a compelling option with its leading benchmark scores and 1 million token context window. You should carefully weigh its competitive pricing against its higher token consumption, which could impact operational costs for high-volume tasks. Prioritize direct testing for your specific creative or agentic coding workflows, as benchmark leadership does not guarantee consistent performance across all use cases.

Key insights

GLM-5.2 is a powerful, open-weights LLM with a vast context window, leading benchmarks but consuming more tokens.

Principles

Open-weights LLMs can achieve leading benchmark performance.
High context windows enhance model capabilities.
Benchmark leadership doesn't guarantee consistent creative output.

In practice

Evaluate token consumption alongside benchmark scores.
Test creative generation for specific use cases.
Consider cost-effectiveness for high-volume tasks.

Topics

Large Language Models
Open-Weights LLMs
GLM-5.2
AI Benchmarking
Code Generation
Mixture-of-Experts

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.