DeepSeek-V4 Pro now available on Together AI
Summary
DeepSeek-V4 Pro, a 1.6T-parameter Mixture-of-Experts (MoE) reasoning model with 49B activated parameters, is now available on Together AI. It offers a 512K-token context window for long-context workloads, with a model-level context of 1M tokens. The model features controllable reasoning modes—Non-Think, Think High, and Think Max—allowing users to balance response speed with reasoning depth. Pricing is set at \$2.10 per 1M input tokens, \$0.20 per 1M cached input tokens, and \$4.40 per 1M output tokens, providing a 90% cost reduction for reused context. DeepSeek-V4 Pro is designed for demanding tasks like code agents, document intelligence, and research synthesis, and can be deployed via Serverless Inference or Monthly Reserved infrastructure on Together AI.
Key takeaway
For AI Engineers building long-context applications, DeepSeek-V4 Pro on Together AI offers a robust solution. You can utilize its 512K context window for complex tasks like code agents or document intelligence. Employ the controllable reasoning modes to optimize performance and cost per workload. Implement cached input pricing to significantly reduce expenses for repeated analysis over stable contexts. Consider Serverless for development and Reserved for production needs.
Key insights
DeepSeek-V4 Pro offers scalable, cost-effective long-context reasoning with flexible deployment and controllable reasoning modes.
Principles
- Match reasoning depth to task complexity.
- Cache stable contexts for cost savings.
- Hybrid attention optimizes long-context serving.
In practice
- Use Non-Think for simple extraction tasks.
- Apply Think Max for deep research synthesis.
- Cache large document sets for repeated queries.
Topics
- DeepSeek-V4 Pro
- Mixture-of-Experts
- Long-Context AI
- Together AI
- Serverless Inference
- Cached Input Pricing
- Reasoning Modes
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Engineer, Machine Learning Engineer, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by Together AI | The AI Native Cloud - Together.ai.