GPT-5.6 Leaked, Mythos Benchmark Leaks, Hermes Desktop App, Qwen 3.7 Plus, & More! AI NEWS

· Source: WorldofAI · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Software Development & Engineering, Robotics & Autonomous Systems · Depth: Intermediate, long

Summary

OpenAI's GPT 5.6 is anticipated to launch next week, with hints from product leads and extensive A/B testing in ChatGPT showcasing impressive game generation and improved UI capabilities, expected to rival Mythos preview 1 while being more token efficient and cheaper. OpenAI also significantly updated Codeex, expanding it beyond coding with role-specific plugins and a new "sites" feature for interactive apps, aiming for eventual integration with ChatGPT. Microsoft unveiled seven new AI models at Build 2026, including MAI thinking one, a 35 billion active parameter Mixture of Experts model performing on par with Claude Opus 4.6, and MAI code one flash, which outperforms Claude Haiku 4.5. Additionally, Microsoft accidentally leaked an estimate of 6.1 * 10 to the power of 27 flops for Claude Mythos's training, suggesting it's one of the largest AI models ever. Other notable releases include Alibaba's multimodal Qwen 3.7 Plus, the Hermes Agent native desktop app, and Anthropic's Claude Code updates with new /fork and CLI tools.

Key takeaway

For AI developers and enterprise strategists evaluating new models, you should closely monitor the imminent GPT 5.6 release for its efficiency and UI generation improvements, and assess Microsoft's new MAI models for competitive reasoning and coding performance. Consider integrating the Hermes Agent desktop app for local, open-source AI workflows, and utilize the World of AI Bench to independently validate model capabilities against vendor claims before committing to specific solutions.

Key insights

The AI landscape is rapidly evolving with major model releases, expanded platform capabilities, and new hardware.

Principles

Method

The World of AI Bench platform evaluates models using nearly 4,000 prompts and an AI judge system, providing insights on functionality, design, and code quality to guide model selection and prompt optimization.

In practice

Topics

Best for: AI Engineer, NLP Engineer, CTO, AI Scientist, Machine Learning Engineer, Tech Journalist

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by WorldofAI.