Perplexity, CoreWeave Deal Boosts Inferencing

2026-03-04 · Source: aibusiness · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, short

Summary

CoreWeave and AI search vendor Perplexity have finalized a multiyear agreement, announced on March 4, 2026, for CoreWeave to host Perplexity's AI inference workloads. This deal highlights the increasing market emphasis on AI inference over training. Perplexity will migrate its next AI inference workloads to CoreWeave Cloud, utilizing Nvidia's GB200 NVL72 clusters to power its AI model, Sonar, and its Search API ecosystem. Additionally, Perplexity will employ CoreWeave Kubernetes Services (CKS) and W&B (weights and balances) models for model management and deployment. The partnership aims to scale Perplexity's AI search and inference capabilities, providing quick, real-time responses for enterprise customers and diversifying CoreWeave's customer base beyond its major contracts with Microsoft, OpenAI, and Meta.

Key takeaway

For CTOs and VPs of Engineering evaluating AI infrastructure, this deal underscores the strategic importance of specialized inference providers. Your teams should consider dedicated AI cloud platforms like CoreWeave for high-volume, real-time inference workloads to ensure performance and cost efficiency, rather than solely relying on hyperscalers. This approach can secure high-performance infrastructure without the need for in-house development.

Key insights

The AI market is shifting focus from training to inference, driving specialized cloud partnerships for real-time AI applications.

Principles

Inference is a continuous, high-volume workload.
Purpose-built AI clouds can offer performance advantages.

Method

Perplexity will migrate AI inference workloads to CoreWeave Cloud, leveraging Nvidia GB200 NVL72 clusters, CKS, and W&B models for deployment and management.

In practice

Migrate AI inference to specialized cloud providers.
Utilize Kubernetes services for AI workload management.

Topics

AI Inference
Cloud Computing
NVIDIA GPUs
AI Search
Model Deployment

Best for: CTO, VP of Engineering/Data, AI Architect, AI Product Manager, Director of AI/ML, Investor

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by aibusiness.