Three reasons why DeepSeek’s new model matters
Summary
Chinese AI firm DeepSeek has released a preview of V4, its new flagship open-source model, which significantly improves long prompt processing efficiency through architectural changes to its attention mechanism. V4 comes in two versions: V4-Pro, designed for coding and complex agent tasks, and V4-Flash, a faster, cheaper alternative. Both versions offer reasoning modes and boast competitive performance against leading closed-source models like Anthropic's Claude-Opus-4.6, OpenAI's GPT-5.4, and Google's Gemini-3.1 on major benchmarks, while outperforming other open-source models. DeepSeek V4-Pro is priced at $1.74 per million input tokens and $3.48 per million output tokens, with V4-Flash even cheaper at $0.14 and $0.28 respectively. A key innovation is its 1-million-token context window, achieved by compressing older information and focusing attention selectively, reducing computing power by 73-90% and memory use by 90-93% compared to its predecessor, V3.2. Notably, V4 is also optimized for domestic Chinese chips like Huawei's Ascend, marking a strategic shift away from Nvidia dependence.
Key takeaway
For AI Architects and NLP Engineers evaluating large language models for deployment, DeepSeek V4 presents a compelling open-source option that balances frontier performance with significantly reduced operational costs, especially for long-context applications. Your teams should consider integrating V4, particularly if you are exploring alternatives to US-made chips or need to process extensive text efficiently, as its pricing is highly competitive and its long-context capabilities are robust.
Key insights
DeepSeek V4 offers competitive performance and cost-efficiency with a 1-million-token context window, optimized for Chinese chips.
Principles
- Selective attention improves long-context efficiency.
- Open-source models can rival closed-source performance.
- Hardware optimization drives cost reduction.
Method
V4's attention mechanism compresses older information while retaining nearby text, significantly reducing computational and memory costs for long context windows, enabling 1-million-token processing.
In practice
- Utilize V4-Pro for complex coding and agent tasks.
- Employ V4-Flash for cost-sensitive, faster applications.
- Explore V4 for applications requiring extensive document analysis.
Topics
- DeepSeek V4
- Open-Source AI Models
- Long Context Windows
- Memory Efficiency
- Chinese AI Chips
Best for: AI Architect, NLP Engineer, Entrepreneur, AI Engineer, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by MIT Technology Review.