[AINews] DeepSeek V4 Pro (1.6T-A49B) and Flash (284B-A13B), Base and Instruct — runnable on Huawei Ascend chips

2026-04-25 · Source: Latent.Space - Www.latent.space · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure, Emerging Technologies & Innovation · Depth: Expert, extended

Summary

DeepSeek has released DSV4, a new family of large language models including DeepSeek-V4 Pro and DeepSeek-V4 Flash, marking their first major architecture refresh since December 2024. DSV4 Pro features 1.6 trillion total parameters (49 billion active) and DSV4 Flash has 284 billion total parameters (13 billion active). Both models support an impressive 1 million token context window, achieved through novel Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) techniques, which reduce FLOPs by 73% and KV cache memory by 90% compared to DeepSeek-V3.2. The models were trained on 32-33 trillion tokens and utilize FP4/FP8 mixed precision. Independent benchmarks place V4 Pro as the #2 open-weight reasoning model, behind Kimi K2.6, with strong performance in long-context and agentic coding tasks. DeepSeek also released DeepEP V2 and TileKernels for optimization and parallelization, and the models are MIT-licensed with competitive API pricing.

Key takeaway

For AI Architects evaluating open-weight models for long-context or agentic applications, DeepSeek V4 Pro and Flash offer compelling performance and efficiency. Your teams should investigate V4's novel attention mechanisms and FP4/FP8 quantization for potential integration, especially given its 1M token context and competitive MIT license. Be mindful of the high token usage in some evaluations, which could impact overall task cost despite low per-token pricing.

Key insights

DeepSeek V4 advances open-weight long-context and agentic coding through novel attention mechanisms and efficient architecture.

Principles

Long-context efficiency is critical for open-weight model utility.
Hybrid attention systems can dramatically reduce KV cache memory.
Open technical reports foster community adoption and innovation.

Method

DeepSeek V4 employs Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) with shared KV vectors, compressed KV streams, and top-k sparse attention to achieve 1M token context with reduced memory footprint.

In practice

Utilize DeepSeek V4 Flash for cost-effective long-document analysis.
Explore V4 Pro for leading open-weight agentic coding performance.
Consider Huawei CANN compatibility for reduced NVIDIA dependence.

Topics

DeepSeek V4
Long-Context AI
Mixture-of-Experts
AI Benchmarking
Huawei Ascend Chips

Code references

Best for: AI Architect, AI Engineer, NLP Engineer, AI Scientist, Machine Learning Engineer, Director of AI/ML

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Latent.Space - Www.latent.space.