DeepSeek-V4 Pro now available on Together AI

2026-04-29 · Source: Together AI | The AI Native Cloud - Together.ai · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Advanced, medium

Summary

DeepSeek-V4 Pro, a 1.6T-parameter Mixture-of-Experts (MoE) reasoning model with 49B activated parameters, is now available on Together AI. It offers a 512K-token context window for long-context workloads, with a model-level context of 1M tokens. The model features controllable reasoning modes—Non-Think, Think High, and Think Max—allowing users to balance response speed with reasoning depth. Pricing is set at \$2.10 per 1M input tokens, \$0.20 per 1M cached input tokens, and \$4.40 per 1M output tokens, providing a 90% cost reduction for reused context. DeepSeek-V4 Pro is designed for demanding tasks like code agents, document intelligence, and research synthesis, and can be deployed via Serverless Inference or Monthly Reserved infrastructure on Together AI.

Key takeaway

For AI Engineers building long-context applications, DeepSeek-V4 Pro on Together AI offers a robust solution. You can utilize its 512K context window for complex tasks like code agents or document intelligence. Employ the controllable reasoning modes to optimize performance and cost per workload. Implement cached input pricing to significantly reduce expenses for repeated analysis over stable contexts. Consider Serverless for development and Reserved for production needs.

Key insights

DeepSeek-V4 Pro offers scalable, cost-effective long-context reasoning with flexible deployment and controllable reasoning modes.

Principles

Match reasoning depth to task complexity.
Cache stable contexts for cost savings.
Hybrid attention optimizes long-context serving.

In practice

Use Non-Think for simple extraction tasks.
Apply Think Max for deep research synthesis.
Cache large document sets for repeated queries.

Topics

DeepSeek-V4 Pro
Mixture-of-Experts
Long-Context AI
Together AI
Serverless Inference
Cached Input Pricing
Reasoning Modes

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.