As agentic AI pushes rivals to raise prices and cap usage, Deepseek ships a good-enough model for almost nothing
Summary
Chinese AI lab Deepseek has released V4-Pro and V4-Flash, two new open-weight models featuring up to 1.6 trillion parameters and a one-million-token context window. These models introduce a novel hybrid attention architecture that significantly reduces compute requirements for long contexts, needing only 27% of FLOPs and 10% of KV cache compared to V3.2 for V4-Pro. This efficiency allows Deepseek to price V4-Flash at $0.14 per million input tokens and V4-Pro at $1.74 per million input tokens, substantially undercutting competitors like OpenAI, Google, and Anthropic. Trained on up to 33 trillion tokens and refined through on-policy distillation from in-house specialist models, V4 models are optimized for agentic tasks and support both Nvidia GPUs and Huawei Ascend chips.
Key takeaway
For AI Engineers evaluating large language models for agentic applications, Deepseek's V4-Pro and V4-Flash offer a compelling balance of performance, context length, and cost. Your teams can achieve significant operational savings, especially with V4-Flash's aggressive pricing, while leveraging a one-million-token context window for complex tasks. Consider integrating these open-weight models to reduce API expenditures and enhance agentic capabilities.
Key insights
Deepseek's V4 models offer massive context windows and competitive performance at significantly reduced costs via architectural innovation.
Principles
- Hybrid attention reduces long-context compute
- Distillation refines specialist model knowledge
- Open-weight models drive cost competition
Method
Deepseek employs a hybrid attention architecture combining token compression with sparse attention, followed by on-policy distillation from multiple in-house specialist models for post-training.
In practice
- Integrate V4 models for agentic workflows
- Utilize V4-Flash for cost-sensitive applications
- Deploy on Nvidia GPUs or Huawei Ascend NPUs
Topics
- Deepseek V4-Pro
- Deepseek V4-Flash
- Hybrid Attention Architecture
- Mixture-of-Experts
- Agentic Workflows
Best for: CTO, VP of Engineering/Data, AI Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by The Decoder.