Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud
Summary
Baseten, an AI inference cloud provider, has experienced 30x growth over the past year, projecting over a billion dollars in revenue, driven by the widespread adoption of AI and the increasing demand for custom model inference. The company emphasizes that the application layer for AI will persist due to unique user signals and specialized workflows, making it difficult for frontier model companies to fully capture this market. Baseten primarily serves AI-native application companies, which in turn serve enterprises, providing a crucial feedback loop for enterprise requirements like data retention and deployment specifications. The market is currently dominated by custom model inference, accounting for over 95% of Baseten's tokens, with customers often modifying open-source models for quality or performance. The AI compute market faces a severe, multi-year supply crunch, with Baseten operating at mid-90s utilization across 90 clusters in 18 clouds, highlighting the strategic importance of access to compute and the need for significant capital investment.
Key takeaway
For CTOs and VPs of Engineering navigating the AI landscape, recognize that the strategic advantage lies in securing compute capacity and developing specialized, custom models. Your teams should prioritize investing in post-training capabilities and building robust software layers around inference to create sticky, high-value solutions. Be prepared for significant capital expenditure and long-term contracts (3-5 years) to secure necessary GPU supply, as the market faces a multi-year crunch and operational challenges with new providers.
Key insights
The AI inference market is experiencing explosive growth, driven by custom models and a severe, persistent compute supply crunch.
Principles
- User signal and specialized workflows secure the AI application layer.
- Cost reduction in AI inference increases intelligence consumption (Jevons Paradox).
- Software layers are critical for stickiness in AI inference services.
Method
Companies should first validate product-market fit with best-in-class models, then optimize for better, faster, and cheaper custom model inference using post-training and specialized data.
In practice
- Prioritize custom model development for unique user signals.
- Invest in post-training and fine-tuning for specialized use cases.
- Diversify compute access across multiple cloud providers.
Topics
- AI Inference Infrastructure
- Custom AI Models
- Model Post-Training
- Open-Source Models
- Enterprise AI Adoption
Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer
Related on AIssential
Editorial summary, takeaway, and curation by AIssential. Original article published by No Priors: AI, Machine Learning, Tech, & Startups.