Three reasons why DeepSeek’s new model matters

· Source: MIT Technology Review · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Robotics & Autonomous Systems, Emerging Technologies & Innovation · Depth: Intermediate, medium

Summary

Chinese AI firm DeepSeek has released a preview of V4, its new flagship open-source model, which significantly improves long prompt processing efficiency through architectural changes to its attention mechanism. V4 comes in two versions: V4-Pro, designed for coding and complex agent tasks, and V4-Flash, a faster, cheaper alternative. Both versions offer reasoning modes and boast competitive performance against leading closed-source models like Anthropic's Claude-Opus-4.6, OpenAI's GPT-5.4, and Google's Gemini-3.1 on major benchmarks, while outperforming other open-source models. DeepSeek V4-Pro is priced at $1.74 per million input tokens and $3.48 per million output tokens, with V4-Flash even cheaper at $0.14 and $0.28 respectively. A key innovation is its 1-million-token context window, achieved by compressing older information and focusing attention selectively, reducing computing power by 73-90% and memory use by 90-93% compared to its predecessor, V3.2. Notably, V4 is also optimized for domestic Chinese chips like Huawei's Ascend, marking a strategic shift away from Nvidia dependence.

Key takeaway

For AI Architects and NLP Engineers evaluating large language models for deployment, DeepSeek V4 presents a compelling open-source option that balances frontier performance with significantly reduced operational costs, especially for long-context applications. Your teams should consider integrating V4, particularly if you are exploring alternatives to US-made chips or need to process extensive text efficiently, as its pricing is highly competitive and its long-context capabilities are robust.

Key insights

DeepSeek V4 offers competitive performance and cost-efficiency with a 1-million-token context window, optimized for Chinese chips.

Principles

Method

V4's attention mechanism compresses older information while retaining nearby text, significantly reducing computational and memory costs for long context windows, enabling 1-million-token processing.

In practice

Topics

Best for: AI Architect, NLP Engineer, Entrepreneur, AI Engineer, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by MIT Technology Review.