Baseten CEO Tuhin Srivastava on Custom Models, and Building the Inference Cloud

2026-05-01 · Source: No Priors: AI, Machine Learning, Tech, & Startups · Field: Technology & Digital — Artificial Intelligence & Machine Learning, Cloud Computing & IT Infrastructure · Depth: Intermediate, extended

Summary

Baseten, an AI inference cloud provider, has experienced 30x growth over the past year, projecting over a billion dollars in revenue, driven by the widespread adoption of AI and the increasing demand for custom model inference. The company emphasizes that the application layer for AI will persist due to unique user signals and specialized workflows, making it difficult for frontier model companies to fully capture this market. Baseten primarily serves AI-native application companies, which in turn serve enterprises, providing a crucial feedback loop for enterprise requirements like data retention and deployment specifications. The market is currently dominated by custom model inference, accounting for over 95% of Baseten's tokens, with customers often modifying open-source models for quality or performance. The AI compute market faces a severe, multi-year supply crunch, with Baseten operating at mid-90s utilization across 90 clusters in 18 clouds, highlighting the strategic importance of access to compute and the need for significant capital investment.

Key takeaway

For CTOs and VPs of Engineering navigating the AI landscape, recognize that the strategic advantage lies in securing compute capacity and developing specialized, custom models. Your teams should prioritize investing in post-training capabilities and building robust software layers around inference to create sticky, high-value solutions. Be prepared for significant capital expenditure and long-term contracts (3-5 years) to secure necessary GPU supply, as the market faces a multi-year crunch and operational challenges with new providers.

Key insights

The AI inference market is experiencing explosive growth, driven by custom models and a severe, persistent compute supply crunch.

Principles

User signal and specialized workflows secure the AI application layer.
Cost reduction in AI inference increases intelligence consumption (Jevons Paradox).
Software layers are critical for stickiness in AI inference services.

Method

Companies should first validate product-market fit with best-in-class models, then optimize for better, faster, and cheaper custom model inference using post-training and specialized data.

In practice

Prioritize custom model development for unique user signals.
Invest in post-training and fine-tuning for specialized use cases.
Diversify compute access across multiple cloud providers.

Topics

AI Inference Infrastructure
Custom AI Models
Model Post-Training
Open-Source Models
Enterprise AI Adoption

Best for: CTO, VP of Engineering/Data, Director of AI/ML, AI Architect, MLOps Engineer

Related on AIssential

Open in AIssential →

Editorial summary, takeaway, and curation by AIssential. Original article published by No Priors: AI, Machine Learning, Tech, & Startups.