[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips
Summary
DeepSeek has released DSV4, a new family of large language models including DeepSeek-V4 Pro and DeepSeek-V4 Flash, marking their first major architecture refresh since December 2024. DSV4 Pro features 1.6 trillion total parameters (49 billion active) and DSV4 Flash has 284 billion total parameters (13 billion active). Both models support an impressive 1 million token context window, achieved through novel Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) techniques, which reduce FLOPs by 73% and KV cache memory by 90% compared to DeepSeek-V3.2. The models were trained on 32-33 trillion tokens and utilize FP4/FP8 mixed precision. Independent benchmarks place V4 Pro as the #2 open-weight reasoning model, behind Kimi K2.6, with strong performance in long-context and agentic coding tasks. DeepSeek also released DeepEP V2 and TileKernels for optimization and parallelization, and the models are MIT-licensed with competitive API pricing.
Key takeaway
For AI Architects evaluating open-weight models for long-context or agentic applications, DeepSeek V4 Pro and Flash offer compelling performance and efficiency. Your teams should investigate V4's novel attention mechanisms and FP4/FP8 quantization for potential integration, especially given its 1M token context and competitive MIT license. Be mindful of the high token usage in some evaluations, which could impact overall task cost despite low per-token pricing.
Key insights
DeepSeek V4 advances open-weight long-context and agentic coding through novel attention mechanisms and efficient architecture.
Principles
- Long-context efficiency is critical for open-weight model utility.
- Hybrid attention systems can dramatically reduce KV cache memory.
- Open technical reports foster community adoption and innovation.
Method
DeepSeek V4 employs Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) with shared KV vectors, compressed KV streams, and top-k sparse attention to achieve 1M token context with reduced memory footprint.
In practice
- Utilize DeepSeek V4 Flash for cost-effective long-document analysis.
- Explore V4 Pro for leading open-weight agentic coding performance.
- Consider Huawei CANN compatibility for reduced NVIDIA dependence.
Topics
- DeepSeek V4
- Long-Context AI
- Mixture-of-Experts
- AI Benchmarking
- Huawei Ascend Chips
Code references
Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.