DeepSeek V4 - almost on the frontier, a fraction of the price

· Source: Simon Willison's Weblog · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Emerging Technologies & Innovation · Depth: Advanced, short

Summary

DeepSeek AI has released two new preview models, DeepSeek-V4-Pro and DeepSeek-V4-Flash, as part of their V4 series on April 24, 2026. Both are 1 million token context Mixture of Experts models under an MIT license. DeepSeek-V4-Pro, with 1.6T total parameters and 49B active, is now the largest open-weights model, surpassing Kimi K2.6 and GLM-5.1. DeepSeek-V4-Flash features 284B total parameters and 13B active. A key highlight is their aggressive pricing: DeepSeek-V4-Flash costs $0.14/million input tokens and $0.28/million output tokens, making it the cheapest small model, while DeepSeek-V4-Pro is the most affordable large frontier model at $1.74/million input and $3.48/million output. This cost efficiency stems from significant architectural improvements, reducing single-token FLOPs and KV cache size for long contexts compared to DeepSeek-V3.2.

Key takeaway

For AI Engineers evaluating large language models for deployment, DeepSeek-V4-Pro and DeepSeek-V4-Flash present compelling cost-performance options. Your teams can significantly reduce inference costs compared to other frontier models, especially for applications requiring 1M token contexts. Monitor quantized versions from teams like Unsloth for potential local deployment on consumer hardware, further optimizing operational expenses.

Key insights

DeepSeek V4 models offer competitive performance and industry-leading low costs through significant efficiency gains.

Principles

Method

DeepSeek-V4 models achieve efficiency by reducing single-token FLOPs and KV cache size, particularly for 1M-token contexts, compared to prior versions like DeepSeek-V3.2.

In practice

Topics

Code references

Best for: CTO, VP of Engineering/Data, AI Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Simon Willison's Weblog.