GPU Clouds, Aggregators, and the New Economics of AI Compute
Summary
Hugo Shi, co-founder and CTO of Saturn Cloud, provides a strategic overview of the cloud GPU market, segmenting providers into hyperscalers, full-service GPU clouds, bare metal/concierge providers, and GPU aggregators. He details how organizations choose providers based on security, managed services, and cost, highlighting the varying capabilities in compute, orchestration (Kubernetes/Slurm), storage, and networking. The discussion covers the trade-offs of workload portability in "Kubernetes-native" stacks and strategies for mitigating "data gravity." Shi also addresses current GPU supply dynamics, the increasing availability of on-demand capacity for older chips like H100s as newer ones like GB300s roll out, and AMD's maturing ecosystem as a competitor to NVIDIA. He notes patterns for separating training and inference workloads across providers, the continued relevance of traditional ML, and usage variations in domains like biotech, concluding with predictions on market consolidation, full-stack GPU cloud experiences, and financial-style GPU marketplaces.
Key takeaway
For CTOs and VPs of Engineering evaluating cloud GPU strategies, recognize that the market offers a spectrum of providers with vastly different service levels and cost structures. Prioritize security and managed service requirements first, then explore specialized GPU clouds or aggregators for potential cost savings on H100s and other previous-generation GPUs. Be prepared to manage more infrastructure components when opting for lower-cost, less-managed providers, and consider AMD's improving ecosystem as a viable alternative to NVIDIA for future deployments.
Key insights
GPU cloud providers offer diverse services and pricing, necessitating careful evaluation based on security, managed services, and workload portability.
Principles
- GPU scarcity is easing, increasing on-demand capacity for prior-generation chips.
- AMD's ROCm stack and PyTorch integration are improving, fostering competition.
- Traditional ML remains vital for specific use cases, often outperforming LLMs in latency and cost.
Method
Assess GPU provider needs by prioritizing security posture and required managed services (Kubernetes, Slurm, storage, networking) before considering cost, as capabilities vary significantly across hyperscalers, full-service GPU clouds, and aggregators.
In practice
- Separate training (hyperscaler) and inference (GPU cloud) workloads to optimize cost.
- Utilize GPU aggregators for best prices if security needs are lighter.
- Deploy open-source Kubernetes services for storage/registry on bare-metal clouds.
Topics
- GPU Cloud Market
- GPU Orchestration
- Data Gravity
- AI/ML Infrastructure
- Cloud Workload Portability
Best for: CTO, Investor, VP of Engineering/Data, Machine Learning Engineer, AI Architect, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by AI Engineering Podcast.