The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin

· Source: Max Woolf's Blog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Data Science & Analytics, Software Development & Engineering · Depth: Intermediate, short

Summary

The OpenRouter AI Model Rankings, retrieved May 25, 2026, show Tencent's Hy3 preview LLM unexpectedly topping token usage by over 50% compared to models like Claude, despite its seemingly inferior quality and higher effective cost. Hy3 preview, priced at \$0.066/1M input tokens from SiliconFlow, has a 44% cache read cost, resulting in an effective price of \$0.034/1M input tokens. In contrast, DeepSeek V4 Flash, when served directly by DeepSeek, offers a 2% cache read cost, leading to a significantly lower effective price of \$0.018/1M input tokens. The article highlights that 98% of LLM API costs are now input tokens, making prompt caching crucial and stated prices misleading. The author remains puzzled by Hy3's popularity, suggesting a single large, non-agentic app might be its primary user.

Key takeaway

For AI Engineers optimizing LLM costs, you must look beyond stated API prices. Effective pricing, heavily influenced by prompt caching and provider-specific cache read costs, can nearly double or halve your actual spend. Prioritize providers like DeepSeek directly for DeepSeek V4 Flash, which offers a 2% cache read cost, significantly undercutting models like Hy3 preview. Always consult OpenRouter's effective pricing table and consider data policy implications before deployment.

Key insights

Effective LLM pricing, driven by prompt caching and high input token usage, significantly deviates from stated costs, influencing model adoption.

Principles

In practice

Topics

Best for: MLOps Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Max Woolf's Blog.