The mysterious Hy3 LLM is topping OpenRouter Model Rankings by a large margin
Summary
The OpenRouter AI Model Rankings, retrieved May 25, 2026, show Tencent's Hy3 preview LLM unexpectedly topping token usage by over 50% compared to models like Claude, despite its seemingly inferior quality and higher effective cost. Hy3 preview, priced at \$0.066/1M input tokens from SiliconFlow, has a 44% cache read cost, resulting in an effective price of \$0.034/1M input tokens. In contrast, DeepSeek V4 Flash, when served directly by DeepSeek, offers a 2% cache read cost, leading to a significantly lower effective price of \$0.018/1M input tokens. The article highlights that 98% of LLM API costs are now input tokens, making prompt caching crucial and stated prices misleading. The author remains puzzled by Hy3's popularity, suggesting a single large, non-agentic app might be its primary user.
Key takeaway
For AI Engineers optimizing LLM costs, you must look beyond stated API prices. Effective pricing, heavily influenced by prompt caching and provider-specific cache read costs, can nearly double or halve your actual spend. Prioritize providers like DeepSeek directly for DeepSeek V4 Flash, which offers a 2% cache read cost, significantly undercutting models like Hy3 preview. Always consult OpenRouter's effective pricing table and consider data policy implications before deployment.
Key insights
Effective LLM pricing, driven by prompt caching and high input token usage, significantly deviates from stated costs, influencing model adoption.
Principles
- LLM API costs are 98% input tokens.
- Prompt caching significantly alters effective pricing.
- Cache read costs vary widely by provider.
In practice
- Consult OpenRouter's effective pricing table.
- Evaluate direct API keys for better caching.
- Review provider data policy on prompt training.
Topics
- LLM Economics
- OpenRouter Rankings
- Prompt Caching
- DeepSeek V4 Flash
- Hy3 Preview
- API Pricing
Best for: MLOps Engineer, CTO, VP of Engineering/Data, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Max Woolf's Blog.